Rate Limiting
Limit the number of requests a user, API key, or IP can make in a time window to protect availability and fairness.
1. Concept Overview
Rate limiting enforces a maximum request rate (e.g. 100 req/min per user). It prevents abuse, protects backends from overload, and helps enforce quotas and cost control.
Why it exists: Without limits, a few clients can exhaust capacity or cause cascading failure; rate limiting keeps the system stable and fair.
2. Core Principles
Algorithms
Token bucket
Refill tokens at fixed rate; each request consumes one (or N); reject when 0
Allows bursts up to bucket size
Slightly more state
Leaky bucket
Requests enter a “bucket”; processed at fixed rate; reject when full
Smooths traffic
Bursts limited by capacity
Fixed window
Count requests in current window (e.g. minute); reset at window boundary
Simple
Boundary spike (e.g. 100 at 0:59 + 100 at 1:00)
Sliding window
Count in last N seconds (or sliding window); more accurate
No boundary spike
More state or approximation
Sliding log
Store timestamp per request; count in window
Accurate
Memory and cost at high QPS
Where to enforce
API gateway: Central place; one policy for all services.
Per service: Fine-grained but duplicated logic.
At client: Optional “client-side” limit to avoid 429s; server still enforces.
Architecture (centralized)
3. Real-World Usage
API Gateway: Kong, AWS API Gateway (usage plans), Apigee.
Redis: Store counters with TTL; increment and check in Lua for atomicity.
Nginx:
limit_req_zone(leaky bucket);limit_conn_zone(connection limit).
4. Trade-offs
Token bucket
Allows bursts
More logic than fixed window
Fixed window
Simple
Boundary double-count
Sliding window
Fair, no boundary spike
More state
Centralized (gateway)
Single policy
Gateway must have state or Redis
Distributed
Works across many gateway nodes
Need shared store (Redis) and consistent key (user/id)
When to use: Any public or partner API; internal APIs that need fairness or cost control. When not: Fully trusted internal services with no abuse risk (optional).
5. Failure Scenarios
Redis (counter store) down
Fail open (allow) or fail closed (reject); prefer fail open with alert
Clock skew (distributed)
Use server time for windows; or logical time from Redis
Bypass (spoofed key)**
Rate limit by authenticated user/id, not just IP; validate auth first
Thundering herd after limit lifted
Sliding window or gradual refill to avoid spike at boundary
6. Performance Considerations
Latency: In-memory or single Redis call; keep under 1 ms.
Throughput: Counters must handle high QPS; Redis or local cache with periodic sync.
Storage: Key = user/id + window; TTL to avoid unbounded growth.
7. Implementation Patterns
Per user: Key =
user_idorapi_key; fair per customer.Per IP: Key =
ip; simple but shared IPs (NAT) affect many users.Hybrid: Stricter per IP for unauthenticated; per user for authenticated.
Tiers: Different limits for free vs paid (e.g. 100 vs 10,000 req/min).
Quick Revision
Token bucket: Burst-friendly. Fixed window: Simple; boundary spike. Sliding window: Fair.
Where: API gateway + Redis (or in-memory for single node).
Key: Prefer user/id over IP when authenticated.
Failure: Fail open vs fail closed; Redis HA so counters are available.
Interview: “We rate limit at the API gateway using a sliding window in Redis keyed by user ID so paid and free tiers get different limits; if Redis is down we fail open and alert so we don’t block all traffic.”
Last updated