Load Balancers
Distribute incoming traffic across multiple servers to improve availability, throughput, and latency.
1. Concept Overview
A load balancer (LB) sits in front of a pool of servers and directs each request to one of them. It provides:
High availability: If one server fails, traffic goes to others.
Scalability: Add more servers to handle more load.
Performance: Spread load so no single server is overwhelmed.
Why it exists: A single server is a single point of failure and a capacity ceiling. Load balancers enable horizontal scaling and fault tolerance.
2. Core Principles
L4 vs L7
L4 (Transport)
IP, port
IP + port (e.g. TCP connection)
Simple, fast; no app awareness
L7 (Application)
Full HTTP
URL, headers, cookies, body
Path-based routing, SSL termination, sticky sessions
Architecture (simplified):
┌─────────────────┐
Clients ─────────▶│ Load Balancer │─────────▶ Server 1
│ (L4 or L7) │─────────▶ Server 2
└─────────────────┘─────────▶ Server 3Common algorithms
Round Robin
Rotate through servers in order
Simple, even in steady state
Ignores load and latency
Least Connections
Send to server with fewest active connections
Adapts to slow/long requests
Slightly more state
Weighted Round Robin
Round robin with weights (e.g. 2:1)
Handles heterogeneous capacity
Static weights
Consistent Hash
Same client/key → same server
Cache affinity, session stickiness
Rebalancing when nodes change
IP Hash
hash(client IP) % N
Sticky by IP
Uneven if IPs skewed
Health checks
Active: LB periodically sends HTTP/TCP checks to each server.
Passive: LB infers health from request success/failure.
Unhealthy servers are removed from the pool until they pass again.
3. Real-World Usage
AWS ALB/NLB: ALB (L7) for HTTP/HTTPS; NLB (L4) for TCP/UDP, low latency.
GCP Load Balancing: Global HTTP(S) vs regional TCP/UDP.
Nginx / HAProxy: On-prem or in-cluster L7/L4 load balancing.
Kubernetes: Service + kube-proxy (L4) or Ingress (L7).
4. Trade-offs
L4
Fast, low CPU, no decryption
No URL/header routing, no content-based logic
L7
Path-based routing, SSL termination, caching
Higher latency and CPU
Round Robin
Simple
Can send traffic to overloaded or slow nodes
Least Connections
Better for variable request duration
Needs connection tracking
Consistent Hash
Sticky sessions, cache affinity
Rebalancing on node add/remove
When to use: Multiple app instances; need HA or horizontal scaling. When not: Single instance (no need); or when a different component (e.g. API gateway) already does routing and you only need internal L4.
5. Failure Scenarios
LB itself fails
Active-passive or active-active LB pair; DNS/anycast failover
All backends down
Return 503; circuit breaker at caller
One backend slow
Least-connections or timeouts; remove from pool on repeated failure
Health check wrong
Tune check interval and threshold; avoid flapping
6. Performance Considerations
Latency: L4 adds minimal latency; L7 adds more (parsing, SSL). Use L4 when you don’t need L7 features.
Throughput: LB can be bottleneck (CPU for SSL, connection table size). Scale vertically or use multiple LBs (e.g. anycast).
Connection limits: Max concurrent connections per backend and global; tune timeouts and pool size.
7. Implementation Patterns
Single LB: Simple; LB is SPOF. Use for dev or low-criticality.
Active-Passive: Standby LB takes over on failure (VIP or DNS).
Active-Active: Multiple LBs share traffic (e.g. DNS round-robin or anycast). Requires stateless backends or shared session store.
Quick Revision
L4: IP+port; fast, no app logic. L7: URL/headers; routing, SSL, stickiness.
Algorithms: Round robin (simple), least connections (variable duration), consistent hash (sticky/cache).
Health checks: Remove unhealthy backends; tune to avoid flapping.
Failure: LB HA (active-passive or active-active); backend failures handled by pool and timeouts.
Interview: “We use an L7 LB for path-based routing and SSL termination; least-connections so long requests don’t overload a single server.”
Last updated