Caching Layer
In-memory store (e.g. Redis, Memcached) used to serve hot data with low latency and reduce load on the primary store.
1. Concept Overview
A caching layer sits between the application and the primary data store (e.g. database). It holds a subset of data in fast storage (RAM) so that repeated reads are served without hitting the DB.
Why it exists: Databases are slower and more expensive per operation than memory. Caching hot data reduces latency and DB load, enabling higher throughput and better user experience.
2. Core Principles
Patterns
Cache-aside: App checks cache; on miss, loads from DB and populates cache. App owns logic.
Read-through: Cache layer loads from DB on miss; app only talks to cache.
Write-through: Writes go to DB and cache together; cache always consistent with DB.
Write-behind: Writes go to cache first; DB updated asynchronously (higher performance, risk of loss).
Eviction
LRU (Least Recently Used): Evict least recently accessed (common default).
LFU (Least Frequently Used): Evict least frequently accessed.
TTL: Expire after a fixed time; good for time-sensitive data.
Architecture
3. Real-World Usage
Redis: Rich structures (strings, hashes, sets, sorted sets); persistence; replication; used for cache, session, rate limit, leaderboards.
Memcached: Simple key-value; multi-threaded; often used for pure cache.
ElastiCache, Azure Cache: Managed Redis/Memcached.
4. Trade-offs
Cache-aside
App controls logic; cache failure → fallback to DB
Stale possible; cache stampede on miss
Write-through
Consistent reads
Higher write latency; cache pollution
Write-behind
Very fast writes
Data loss if cache dies before DB write
In-memory
Very low latency
Cost; size limit; volatile unless persisted
When to use: Read-heavy workload; latency-sensitive; can tolerate staleness or invalidate on write. When not: Write-heavy with strong consistency; or data doesn’t have locality (low hit rate).
5. Failure Scenarios
Cache down
Fall back to DB; accept higher latency; optional stale cache from replica
Stampede (many requests on same miss)
Single-flighter or lock; TTL; prewarm hot keys
Stale data
TTL; invalidate on write (delete or update cache); version in key
Memory full
Eviction policy (LRU/LFU); scale cache size or shard
6. Performance Considerations
Latency: Sub-millisecond for cache hit; avoid heavy serialization or large values.
Throughput: In-memory cache can handle hundreds of thousands of ops/s per node.
Hit rate: Design keys and TTL so hot data stays in cache; monitor hit ratio.
7. Implementation Patterns
Single cache: One Redis/Memcached instance; simple; single point of failure.
Replicated cache: Primary + replicas; read from replica for scaling; failover to replica.
Distributed cache: Sharded (e.g. Redis Cluster); consistent hashing; see hld-problems/hard/distributed-cache.md.
Quick Revision
Purpose: Low latency and reduced DB load by keeping hot data in memory.
Cache-aside: App checks cache, loads DB on miss, fills cache. Write-through: Write DB + cache.
Eviction: LRU common; TTL for freshness.
Failure: Cache down → DB fallback; stampede → single-flighter or TTL.
Interview: “We use Redis as a cache-aside layer with a 1-hour TTL for user profiles; on miss we hit the DB and backfill. We invalidate on update. If Redis is down we fall back to the DB and accept higher latency.”
For full caching strategies, invalidation, and CDN, see core-concepts/caching-cdn.md.
Last updated