githubEdit

Distributed Cache (Redis)

Difficulty: Hard Topics: Consistent Hashing, Replication, Failover, Cache Invalidation Time: 60 minutes Companies: Google, Meta, Amazon (ElastiCache), Redis Labs

Problem Statement

Design a distributed, in-memory cache like Redis Cluster that:

  • Scales horizontally (add nodes to increase capacity)

  • Provides high availability (automatic failover)

  • Supports common cache operations (GET, SET, DELETE)

  • Handles node failures gracefully

  • Maintains data consistency across replicas

Requirements

Functional

  • SET(key, value, TTL): Store key-value with optional expiration

  • GET(key): Retrieve value

  • DELETE(key): Remove key

  • Automatic data distribution across nodes

  • Replication for availability

Non-Functional

  • Latency: P99 < 5ms

  • Throughput: 1M ops/sec per node

  • Availability: 99.99% uptime

  • Scalability: Add/remove nodes without downtime

  • Durability: Optional (in-memory by default, can persist to disk)

Scale Estimation

Core Concepts

1. Consistent Hashing

Problem: Simple hash % N breaks when N changes

Consistent Hashing Solution:

Virtual Nodes (for balance):


2. Redis Cluster Hash Slots

Redis Approach: 16,384 hash slots

Adding Node D:


3. Replication

Replication Flow:


Architecture


Data Model


API Design

Smart Client (Cluster-Aware)

Redirection (MOVED)

Resharding (ASK)


Scaling Operations

Adding a Node

Zero Downtime: Resharding happens while cluster serves requests (ASK redirects)

Removing a Node


Replication & Failover

Automatic Failover

Quorum: Majority of masters must agree (3 masters → need 2 votes)


Consistency Trade-offs

Eventual Consistency (Async Replication)

Solution: WAIT command


Cache Eviction

Eviction Policies

LRU Implementation (Approximate):


Failure Scenarios

Master Failure

Network Partition (Split Brain)

All Replicas Fail


Monitoring


Interview Tips

Q: "Why consistent hashing over simple hash?"

  • A: "Adding/removing nodes with simple hash requires rehashing ALL keys. Consistent hashing only moves ~1/N keys."

Q: "How does Redis Cluster handle failures?"

  • A: "Automatic replica promotion via voting. Majority of masters must agree. Failover takes 1-2 seconds."

Q: "What if you can't tolerate data loss?"

  • A: "Use WAIT command for synchronous replication (slower), or persistence (RDB/AOF snapshots)."

Q: "Redis vs Memcached?"

Feature
Redis
Memcached

Data structures

Rich (lists, sets, hashes)

Key-value only

Persistence

Yes (RDB/AOF)

No

Replication

Yes

No

Clustering

Built-in

External (libmemcached)

Use case

Cache + data store

Pure cache

Trade-offs:

  • Availability vs Consistency: Redis prioritizes availability (asynchronous replication)

  • Memory vs Durability: In-memory (fast) but must persist to disk for durability (slower)

  • Simplicity vs Features: Memcached simpler, Redis more powerful

Last updated