#10 Real time gaming leaderboard

Below is a complete, time-boxed, interview-ready (1 hour) answer for designing a Real-time Gaming Leaderboard (think global/top-N leaderboards, per-region leaderboards, friend/room leaderboards, seasonal ladders). It follows your pattern: clarify → FR/NFR → APIs & data model → high-level design → deep dives (ranking, sharding, consistency, anti-cheat) → capacity math → ops/security → trade-offs & wrap-up. Use this as a script you can speak during an interview; minutes show what to say and when.

0 – 5 min — Problem recap, scope & assumptions

Quickly restate problem and align scope.

Goal: Build a real-time leaderboard system that ingests player score updates and presents sorted leaderboards with low latency. Must support global leaderboards, per-region and per-game-mode leaderboards, friend/party leaderboards, top-K queries, rank queries for a single player, pagination, seasonal resets, and anti-cheat measures.

Key SLA / example assumptions (adjustable):

50M monthly active players (MAU), 5M daily active (DAU).
Peak concurrent players submitting updates: 200k updates/sec.
Read traffic (leaderboard pages, rank lookups): 500k QPS peak (reads >> writes).
Leaderboard query latency target: P95 < 50 ms.
Freshness: updates visible in leaderboards within ≤ 1–2 s for “near-real-time” modes; configurable for slower seasonal processing.
Retention: keep raw score events 30 days for audit + anti-cheat.

Clarify whether exact ranking ties are broken deterministically (e.g., timestamp) — assume deterministic tie-breaker (higher timestamp older wins) unless told otherwise.

5 – 15 min — Functional & Non-Functional Requirements

Functional Requirements (Must / Should / Nice)

Must

Submit score: client/servers submit updates {player_id, leaderboard_id, score, metadata, ts}.
Get top K: GET /leaderboard/{id}/top?k=100 returns top-K entries with ranks and metadata.
Get player rank: GET /leaderboard/{id}/player/{player_id} → rank, score, neighbors.
Pagination: page through leaderboard results.
Friend / party leaderboards: filtered leaderboards for a player’s social graph or party.
Time windows & seasons: support rolling/seasonal leaderboards with reset/decay policies.
Atomic updates & tie-breaking: consistent rank semantics on score updates.
Anti-cheat & validation: flag suspicious updates; allow rollbacks.
Audit & history: keep event log for reconciliation & disputes.
Notifications: webhooks/push when user crosses thresholds (top-100, new personal best).

Should

Support multiple scoring types (max score, cumulative points, ELO).
Support leaderboard merges and derived leaderboards (e.g., per-region aggregated into global).
Provide admin tools to adjust ranks, correct fraud, and run seasonal resets.

Nice-to-have

Real-time streaming for client UIs (WebSocket/SSE) to push rank changes; ML-based anomaly detection.

Non-Functional Requirements

Latency: leaderboard queries P95 < 50 ms; writes visible within ≤ 1–2 s (configurable).
Throughput: handle 200k writes/sec and 500k reads/sec peaks, scale horizontally.
Availability: 99.95% read availability; writes highly available with eventual consistency across regions.
Durability: persistent event log and snapshots for recovery.
Scalability: support millions of leaderboards across many games/modes and millions of players.
Consistency: strong ranking consistency within a partition (leaderboard or shard), eventual across replicated shards.
Security: auth for updates, anti-spoofing, rate limiting, audit logs.
Cost: optimize memory for hot leaderboards; tier cold data.
Observability: metrics for update latency, staleness, fraud flags, read/write QPS.

Acceptance criteria examples: top-K P95 < 50ms; update-to-view <= 2s for near-real-time leaderboards; false-positive fraud flags < 1%.

15 – 25 min — APIs, data model & acceptance schema

External APIs (simple)

POST   /score      {player_id, leaderboard_id, score, metadata, ts, signature}
GET    /leaderboard/{id}/top?k=100&cursor=...
GET    /leaderboard/{id}/player/{player_id}?neighbors=5
GET    /leaderboard/{id}/range?start=100&limit=50
GET    /leaderboard/{id}/friends/{player_id}?k=50
POST   /leaderboard/{id}/admin/reset  (admin)
POST   /leaderboard/{id}/admin/correct  (admin correction)
WS     /leaderboard/{id}/subscribe  (optional push updates)

Data model

ScoreEvent: {event_id, player_id, leaderboard_id, score, ts, meta} stored in append-only event log (Kafka).
LeaderboardEntry (materialized): {leaderboard_id, player_id, score, tie_breaker_ts, rank, extra_meta} — stored in fast serving store (in-memory sorted structure + persisted shard).
Indexes: player → position mapping for fast rank lookup, and sorted index for ranges.

Acceptance semantics: For “max” leaderboards keep max(score) per player; for cumulative, increment stored value. Tie-breaker: lower tie_breaker_ts (earlier) wins or configurable.

25 – 40 min — High-level architecture & data flow

Clients
  |
API Gateway (auth, rate-limit)
  |
Ingest Frontend (stateless)  ---> Event Bus (Kafka / Pulsar)  ---> Stream Processing
  |                                     |                              |
  |                                     +----> Raw Event Sink (S3)    |
  |                                                                    |
  +----> Real-time Serving Layer <-------------------------------------+
             (Sharded In-Memory Indexes)
             (e.g., Redis Sorted Sets / Custom C++ service / Aerospike)
             |
       Read API / Cache (CDN for top pages) 
             |
       WebSocket Push Service (optional)

Components

API Gateway / Ingest Frontend
- Validate request (auth, signature), rate-limit, quick sanity checks (score ranges). Reject obviously invalid updates.
- Write canonical ScoreEvent to Event Bus with partitioning key = leaderboard_id (ensures ordering per leaderboard).
Event Bus (Kafka/Pulsar)
- Durable, partitioned stream. Partition by leaderboard_id % N so each leaderboard’s events hit same partition; ensures ordered processing per leaderboard.
- Short retention (hours-days) for replay; long-term archive to S3 for audit.
Stream Processor (Flink/Beam)
- Consumes events per partition; applies dedupe, anti-cheat heuristics, attribute scoring policy (max vs cumulative), computes delta, and writes updates to Serving Layer.
- Maintains state (player’s current score) in windowed state store for correctness and de-duplication.
Serving Layer (materialized leaderboards)
- In-memory sorted structures sharded by leaderboard (or leaderboard shard). Options: Redis Sorted Sets, custom in-memory service backed by persistent storage (RocksDB + memory indices), or a combination.
- Each shard maintains score -> player sorted index and player -> score map for O(logN) rank updates and O(logN) rank lookups.
- Persist periodic snapshots and write-ahead logs for recovery; also sync to cold store for durability.
Read APIs
- Serve reads directly from Serving Layer; use read replicas for heavy read traffic.
- Use caches/CDN for global top-K pages (top 100) served frequently.
Anti-cheat / Fraud Detection
- Stream processor applies rules: impossible jumps, geo/ip anomalies, rate anomalies, emulator signatures; suspicious events go to quarantine stream for manual/automated review.
- Admin workflows to rollback fraudulent updates.
Push / Notification Service
- Optional: pushes to subscribers via WebSocket or SSE for live UI updates.
Batch / Recompute Jobs
- Periodic full recompute from event archives for replay after fixes or to reconcile anomalies (daily/weekly jobs).

40 – 50 min — Core algorithms & deep dives

Ranking & Data structures

Sorted Set per shard: balanced tree (skip list, B-tree) or binary heap with indexes; Redis Sorted Set (ZSET) is a good off-the-shelf.
Operations
- updateScore(player, newScore): check existing score; if update needed, update sorted set (O(log N)) and update player->score map. Then compute new rank (rank = zrank) and optionally notify player/peers.
- getTopK(k): range query on sorted set tail/head O(log N + k).
- getRank(player): O(log N) via index.
Sharding
- Each leaderboard is assigned to a shard; heavy leaderboards (millions entries) can be range-sharded by score ranges or by hash of player_id with a merge-step for global top-K.
- Global top-K across many shards: maintain per-shard top-K and merge them (k-way merge) for a final top-K—cheap since k is small (e.g., 100). Keep cached global top-K refreshed frequently.

Update semantics & concurrency

Per-leaderboard partitioning ensures ordered processing of updates for a single leaderboard.
Exactly-once / idempotency: dedupe via event_id or (player_id, ts) and stream-processing checkpointing; ensure processor commits offsets only after durable update to Serving Layer or use transactional write (two-phase commit) patterns. Flink + Kafka EOS can help.
Atomicity: update to Serving Layer must atomically update sorted set + inverted index + persist WAL. Use transactions or carefully ordered writes with recovery.

Friend leaderboards & filters

For friends, retrieving top among friend-set: maintain per-player friend lists in cache; when querying, fetch friend scores (multi-get) and sort client-side or precompute friend leaderboard for frequent players. For large friend lists, stream processor can maintain a per-player small priority queue of top friends.

Seasonal resets, TTLs & decay

Seasonal leaderboards: on season end, freeze leaderboard snapshot, export final stats, reset per-season storage, and optionally seed new season with initial weights.
Score decay (ELO/time decay): stream processor applies decay transformations (periodic job) to stored scores.

Anti-cheat & rollback

Flagging: rules in stream processor produce flaggedEvent stream. Admin tools to approve/reject flags.
Rollback: to revert a fraudulent update, either subtract delta (if stored) or replay events from archive excluding flagged events and recompute leaderboard (cold recompute). Keep write-ahead logs to support partial rewinds.

50 – 55 min — Back-of-the-envelope capacity & sizing

Use interviewer numbers if provided; here’s a sample sizing example.

Assumptions

Writes: 200k updates/sec peak.
Reads: 500k QPS peak (mostly for top-K pages & rank lookups).
Avg leaderboard size: many are small (100s), a few are large (millions). Hot leaderboards: 10k leaderboards with heavy traffic.

Event bus (Kafka)

Each event ~300 bytes → 200k × 300 B = 60 MB/s ≈ 518 Mbps ingest. With replication factor 3 → ~1.56 Gbps sustained. Plan partitions = 2000+ for parallelism.

Stream processors

Scale workers to process 200k eps with stateful keyed parallelism; number depends on per-partition throughput—maybe dozens to a few hundred Flink task managers depending on CPU/IO.

Serving layer

Memory for hot leaderboards:
- Suppose top 10k hot leaderboards average 1M players each (worst-case): storing each entry (player_id + score + metadata) ~ 32 bytes (optimistic) → 1M × 32 B = 32 MB per leaderboard → 10k × 32 MB = 320 GB RAM.
- Realistic: mix of sizes; use Redis cluster with ~500 GB–1 TB RAM for hot sets, and persist rest to disk-backed stores.
Read replicas: scale to serve 500k QPS via many read replicas and CDN for cached top-K.

Persistence

Raw event archival: 200k eps × 300 B = 60 MB/s → per day ≈ 5.2 TB/day → 30-day = ~156 TB in S3 (choose compression). Plan cost & lifecycle.

Networking & IO

DataNodes and serving nodes need high network bandwidth (10–100 Gbps) per rack; partition leader placement and client proximity for latency.

55 – 58 min — Observability, ops, security & testing

Monitoring

Metrics: write/s read/s, ingestion lag (Kafka lag), stream process latency, serving layer P95/P99 latency, cache hit ratio, leaderboards hotness, fraud flag rate, snapshot/restore latency.

Alerts

High processing lag, serving errors, spike in flagged events, large divergence between cached top-K and merged top-K.

Testing

Load tests with synthetic updates & reads, chaos testing (kill stream processors, replays), correctness tests with known sequences, anti-cheat simulation.

Security

Auth & signature on updates to avoid spoofing (JWT + HMAC), rate-limiting to prevent spammy updates, ACLs for admin ops. Encrypt event archives at rest and in transit.

Operational playbooks

Recompute leaderboard from archive, emergency rollback, scaling add shard, tuning top-K cache durations.

58 – 60 min — Trade-offs, evolution & summary (wrap-up)

Key trade-offs

Memory (in-memory sorted sets) vs Cost: keeping entire large leaderboards in RAM gives fastest queries but is expensive. Hybrid: keep top-K in RAM, cold parts on disk, and use k-way merge for global rankings.
Strong ordering & low latency vs throughput: per-leaderboard partitioning ensures ordering but can create hotspots if one leaderboard is huge; mitigate with sub-sharding and top-K caches.
Real-time accuracy vs cost: real-time exact ranks are expensive—provide near-real-time (1–2s) and offer eventual strong correctness via periodic reconciliation.
Anti-cheat complexity vs UX: aggressive blocking reduces fraud but risks false positives—provide quarantine and human review.

Evolution path

MVP: API ingest → Kafka → single-stream processor → Redis ZSET per leaderboard + read APIs + top-K cache.
Add: sharding for very large leaderboards, stream-based anti-cheat, friend leaderboard optimization, WebSocket push.
Add: cross-region replication, global top-K merging with periodic refresh, ML-based fraud detection, admin tools for rollbacks.

One-line summary

Partition events by leaderboard, process them in a stateful stream to compute and validate score deltas, materialize sorted leaderboards in a sharded, memory-optimized serving layer (top-K in RAM + cold storage for tails), and provide low-latency read APIs with caching and admin tooling for fraud detection and reconciliation — balancing latency, correctness and cost.

If you’d like, I can now:

convert this into a 5-slide interview deck for quick delivery, or
produce a sequence diagram for submitScore -> stream -> serving -> read flow, or
run capacity math tuned to your expected writes/sec, reads/sec, and average leaderboard sizes.

Which would you like next?

Previous#9 S3 like object storage Next#11 Payment system

Last updated 4 months ago

hashtag0 – 5 min — Problem recap, scope & assumptions

hashtag5 – 15 min — Functional & Non-Functional Requirements

hashtagFunctional Requirements (Must / Should / Nice)

hashtagNon-Functional Requirements

hashtag15 – 25 min — APIs, data model & acceptance schema

hashtagExternal APIs (simple)

hashtagData model

hashtag25 – 40 min — High-level architecture & data flow

hashtag40 – 50 min — Core algorithms & deep dives

hashtagRanking & Data structures

hashtagUpdate semantics & concurrency

hashtagFriend leaderboards & filters

hashtagSeasonal resets, TTLs & decay

hashtagAnti-cheat & rollback

hashtag50 – 55 min — Back-of-the-envelope capacity & sizing

hashtag55 – 58 min — Observability, ops, security & testing

hashtag58 – 60 min — Trade-offs, evolution & summary (wrap-up)

hashtagKey trade-offs

hashtagEvolution path

hashtagOne-line summary