#21 API rate limiter system

Here’s a complete, time-boxed, 1-hour interview-ready answer for designing an API Rate Limiter System. It follows your system design interview structure, including functional & non-functional requirements, APIs/data model, architecture, deep dive, and trade-offs.

0 – 5 min — Problem recap, scope & assumptions

Goal: Design a system that limits the number of API requests a client (user, service, or app) can make in a given time window. The system should prevent abuse, ensure fair usage, and maintain high availability.

Scope for interview:

Rate limiting per API key / user / client IP.
Support distributed microservices and multiple data centers.
Configurable limits (per second, minute, hour).
Support burst traffic (short spikes) gracefully.
Handle high request throughput with low latency.

Assumptions:

Peak requests: 1M requests/sec.
Time windows configurable per client.
Use cases: public APIs, internal microservices, SaaS platforms.
Clients are authenticated using API keys.
System should be horizontally scalable.

5 – 15 min — Functional & Non-Functional Requirements

Functional Requirements

Must

Limit enforcement: Restrict API usage per user/client key.
Configurable rules: Support rate limits per API endpoint, client type, or subscription plan.
Distributed support: Works across multiple servers/microservices.
Real-time tracking: Track request counts in near real-time.
Response headers: Return headers indicating remaining quota and reset time.
Burst handling: Allow short bursts beyond the rate limit (optional).

Should

Support multiple algorithms: fixed window, sliding window, token bucket, leaky bucket.
Alerting when limits are exceeded or abused.
Integration with API gateways or service mesh.

Nice-to-have

Tiered limits based on subscription.
Analytics on usage patterns.
Auto-throttling or dynamic rate limits based on system load.

Non-Functional Requirements

Latency: Limit checks should be <1 ms to avoid API slowdowns.
Availability: 99.99% uptime; failure must fail open or default safe.
Scalability: Handle millions of requests/sec; horizontally scalable.
Consistency: Strong consistency per client is preferable but eventual consistency acceptable for distributed systems.
Durability: Persist usage counters for audits or billing.
Fault tolerance: Continue to enforce limits even during node failures.
Monitoring: Track request counts, exceeded limits, node health.

15 – 25 min — API / Data Model

APIs

# Check if a request can be allowed
is_allowed(api_key: str, endpoint: str) -> bool

# Configure rate limits
set_rate_limit(api_key: str, endpoint: str, limit: int, window: int)

# Get current usage and quota
get_usage(api_key: str, endpoint: str) -> {count, remaining, reset_time}

Data Models

Client Rate Limit

{
  "api_key": "string",
  "endpoint": "/api/v1/resource",
  "limit": 1000,       # requests
  "window": 60          # seconds
}

Usage Counter

{
  "api_key": "string",
  "endpoint": "/api/v1/resource",
  "window_start": "timestamp",
  "request_count": int
}

25 – 40 min — High-level architecture & data flow

[Client] --> [API Gateway / Proxy] --> [Rate Limiter Service] --> [Redis / In-Memory Store]
                                      |
                                      v
                              [Persistent Storage / DB] (optional for audit)

Components

API Gateway / Proxy: Receives requests, calls Rate Limiter before forwarding to backend.
Rate Limiter Service: Checks if request exceeds limits. Implements algorithms (fixed window, sliding window, token bucket).
In-Memory Store: Redis / Memcached to store counters for low-latency access.
Persistent Storage: Optional DB to store counters for auditing or billing.
Monitoring & Alerting: Track request trends, exceeded limits, system health.

Data Flow

Client sends request → API Gateway.
Gateway calls Rate Limiter with API key and endpoint.
Rate Limiter checks usage in Redis / memory → allows or rejects request.
If rejected → return 429 Too Many Requests with remaining quota/reset info.
Periodically sync counters to persistent DB for audits.

40 – 50 min — Deep dive — algorithms, scaling, reliability

Rate Limiting Algorithms

Fixed Window: Simple counter per window; can cause spikes at window boundaries.
Sliding Window: Smooths requests across window; more accurate.
Token Bucket: Allows bursts; tokens refill at fixed rate.
Leaky Bucket: Smooth out traffic; requests processed at constant rate.

Scaling & Distribution

Sharding: Redis clusters partitioned by API key or client ID.
Replication: Multi-node Redis cluster for fault tolerance.
Local caches: Each gateway node keeps local counters for ultra-low latency, periodically sync to central store.
Consistency: Use atomic increment operations (Redis INCR) to prevent race conditions.

Fault Tolerance

If Redis unavailable → optionally allow requests (fail open) or deny all (fail closed).
Use replication and persistence (RDB/AOF) in Redis for durability.

50 – 55 min — Back-of-the-envelope calculations

Assumptions

1M requests/sec peak, 10M clients.
Each usage counter: 64 bytes → 10M clients → 640 MB (fits in-memory).
Latency: Redis INCR ~1 ms; total request processing <2 ms including network.
Scaling: Horizontally add Redis nodes and gateway instances for linear scaling.

Storage

Persist counters daily for audits → 640 MB/day → ~0.25 TB/year (very manageable).

55 – 58 min — Monitoring & ops

Monitoring

Requests per second, rejected requests (429 rate).
Latency of limit checks.
Node health and memory usage.

Operational concerns

Update rate limits dynamically without downtime.
Handle Redis failover gracefully.
Alert on unusual spikes or limit violations.

58 – 60 min — Trade-offs, evolution & summary

Trade-offs

Memory vs Accuracy: Approximate counters (e.g., Redis HyperLogLog) reduce memory but less precise.
Centralized vs Local counters: Local caches reduce latency but require periodic sync.
Fixed vs Sliding Window: Fixed window is simpler; sliding window smoother but more complex.

Evolution

MVP: Fixed window per API key in Redis; fail open on Redis failure.
Phase 2: Sliding window or token bucket; distributed across multiple nodes.
Phase 3: Dynamic tiered limits, per endpoint limits, analytics dashboards, adaptive throttling.

Summary

Build a highly available, low-latency API Rate Limiter:
- API Gateway calls Rate Limiter.
- Redis or in-memory store maintains counters.
- Algorithms: fixed window, sliding window, token bucket.
- Horizontally scalable and fault-tolerant.
- Tracks usage, prevents abuse, provides quota info, and supports dynamic updates.

If you want, I can next create a sequence diagram showing request flow through the rate limiter with counters update, which is very handy to explain in an interview.

Do you want me to create that diagram next?

Previous#20 type ahead search engine service Next#22 recommendation system

Last updated 4 months ago

hashtag0 – 5 min — Problem recap, scope & assumptions

hashtag5 – 15 min — Functional & Non-Functional Requirements

hashtagFunctional Requirements

hashtagNon-Functional Requirements

hashtag15 – 25 min — API / Data Model

hashtagAPIs

hashtagData Models

hashtag25 – 40 min — High-level architecture & data flow

hashtag40 – 50 min — Deep dive — algorithms, scaling, reliability

hashtagRate Limiting Algorithms

hashtagScaling & Distribution

hashtagFault Tolerance

hashtag50 – 55 min — Back-of-the-envelope calculations

hashtag55 – 58 min — Monitoring & ops

hashtag58 – 60 min — Trade-offs, evolution & summary

hashtagTrade-offs

hashtagEvolution

hashtagSummary