Scaling Strategies
How to scale systems: horizontal vs vertical, database scaling, replication, partitioning, caching, queues, and async processing.
1. Concept Overview
Scaling is increasing a system’s capacity to handle more load (traffic, data, or both). Strategies differ by what you scale (compute, storage, reads, writes) and how (vertical vs horizontal, replication vs sharding, sync vs async).
Why it matters: At SDE-3 level you are expected to choose and justify scaling strategies and explain trade-offs (cost, complexity, consistency, operations).
2. Horizontal vs Vertical Scaling
Vertical scaling (scale up)
What: Add more CPU, RAM, or disk to the same machine.
Pros: Simple; no distributed systems; no application changes.
Cons: Hard limits (max instance size); single point of failure; often more expensive per unit capacity at high end.
When: Early stage; quick fix; or components that are hard to distribute (e.g. legacy DB).
Horizontal scaling (scale out)
What: Add more machines (nodes); distribute load and data across them.
Pros: No single ceiling; can use cheaper hardware; fault tolerance (multiple nodes).
Cons: Complexity (distribution, consistency, coordination); operational overhead.
When: Growth beyond one machine; need for HA and elasticity.
Comparison
Complexity
Low
High
Limit
Single machine max
Theoretically unlimited
Failure
Single point of failure
N-1 resilience
Cost
Often nonlinear at high end
Linear with nodes
Application
Usually no change
May need stateless design, sharding, etc.
3. Database Scaling
Read scaling
Read replicas: Primary takes writes; replicas replicate (async or sync); reads go to replicas.
Caching: Cache hot data in front of DB (Redis, Memcached); reduces DB read load.
CDN: For static or cacheable content; offload entirely from DB.
Trade-off: Replicas can lag; cache can be stale. Use “read-your-writes” (route user’s reads to primary or same replica) when needed.
Write scaling
Sharding: Partition data by key across multiple DB instances; each shard takes a fraction of writes. See building-blocks/sharding.md.
Async writes: Accept write in API, persist to queue, workers write to DB (write-behind); increases write throughput and smooths spikes.
Batching: Group many small writes into fewer large writes (e.g. time or count threshold).
Trade-off: Sharding adds complexity (routing, resharding, cross-shard queries); async writes add eventual consistency and operational complexity.
Storage scaling
Sharding: More shards → more total storage.
Archival: Move old data to cold storage (e.g. S3, Glacier); keep hot data in primary DB.
Compression and encoding: Reduce size per row; more rows per node.
4. Replication Strategies
Leader–follower: One primary, N replicas; simple; read scaling and HA. See building-blocks/replication.md.
Multi-leader: Multiple primaries (e.g. per region); conflict resolution required.
Leaderless (quorum): W + R > N for consistency; tunable W, R for latency vs durability.
When: Replication for HA and read scaling; multi-leader only when you need writes in multiple regions and can handle conflicts.
5. Partitioning (Sharding) Strategies
Hash-based:
shard = hash(key) % N; even distribution; resharding costly (use consistent hashing to reduce moves).Range-based: Ranges of key (e.g. A–M, N–Z); good for range queries; risk of hotspots.
Directory-based: Lookup table key → shard; flexible but lookup can be bottleneck.
When: Write or storage exceeds single node; design access patterns around shard key to avoid cross-shard queries.
6. Caching Strategies
Cache-aside: App loads DB on miss and fills cache; good for read-heavy, variable access.
Write-through: Write DB + cache; consistent but higher write cost.
Write-behind: Write cache, async to DB; high write throughput; risk of loss.
TTL and invalidation: Balance freshness vs hit rate; invalidate on write for critical data.
When: Read-heavy; latency-sensitive; can tolerate staleness or explicit invalidation. See core-concepts/caching-cdn.md and building-blocks/caching-layer.md.
7. Queue-Based Architectures
Decouple producers and consumers: API responds quickly; workers process asynchronously (emails, notifications, analytics).
Load leveling: Spike in requests → queue absorbs; workers drain at steady rate.
Backpressure: When queue or downstream is overloaded, slow or reject producers (e.g. 503, or backpressure in streams).
When: Async processing acceptable; need to absorb spikes or integrate with external systems. See building-blocks/message-brokers.md.
8. Asynchronous Processing
Async I/O: Non-blocking calls; one thread can handle many requests (e.g. Node.js, async/await).
Async workflows: Request returns immediately; long-running work in queue + workers; notify when done (webhook, polling, or SSE).
Event-driven: Services emit events; other services react; eventual consistency.
Trade-off: Simplicity and strong consistency (sync) vs scalability and resilience (async). Async requires idempotency and retry handling. See advanced-topics/distributed-concepts.md (idempotency, retry, backpressure).
9. Decision Summary
More read capacity
Read replicas, cache
Staleness, replication lag
More write capacity
Sharding, async writes
Complexity, eventual consistency
More storage
Sharding, archival
Resharding cost, query limits
High availability
Replication, multi-AZ
Consistency vs availability (CAP)
Lower latency
Cache, CDN, closer regions
Staleness, cost
Absorb spikes
Queues, async
Operational complexity, eventual consistency
Quick Revision
Vertical: Bigger machine; simple, limited. Horizontal: More machines; scalable, more complex.
Reads: Replicas + cache. Writes: Sharding + optional async.
Replication: Leader–follower common; sync vs async trade-off.
Sharding: Hash/range/directory; design around shard key.
Caching: Cache-aside common; TTL and invalidation for freshness.
Queues: Decouple and level load; design for at-least-once and idempotency.
Interview: “We scale reads with read replicas and Redis cache; we scale writes by sharding by user_id when we outgrow one DB. We use queues for notifications and analytics so the API stays fast and we absorb traffic spikes.”
Last updated