Interview Framework
How to approach a system design interview — A step-by-step framework for SDE-3 / Senior Software Engineer interviews
Overview
System design interviews evaluate your ability to clarify ambiguity, reason about scale, make trade-offs, and design for failure. This framework gives you a repeatable structure so you spend time on high-value discussion instead of figuring out what to do next.
Typical duration: 45–60 minutes Your goals: Show structured thinking, ask the right questions, drive the conversation, and demonstrate senior-level judgment (trade-offs, cost, operations, resilience).
The Seven Phases
1. Clarify requirements
5–10 min
Functional + non-functional; scope and constraints
2. Estimate scale
5 min
QPS, storage, bandwidth; back-of-envelope
3. High-level architecture
10–15 min
Components, data flow, boundaries
4. Identify bottlenecks
5 min
Where the system will break or slow down
5. Discuss trade-offs
5 min
Consistency vs availability, latency vs cost, etc.
6. Deep dive into components
15–20 min
1–2 components in detail (APIs, data model, scaling)
7. Scaling and failure handling
5–10 min
Horizontal scaling, failover, degradation
Phase 1: Clarify Requirements
Why it matters: Jumping into boxes and arrows before understanding the problem is the most common mistake. Senior engineers align on scope first.
Functional requirements
Core features: What are the 3–5 must-have features?
Users: Who uses the system (B2C, B2B, internal)?
Critical user journeys: e.g. “User shortens URL → later opens short URL → gets redirected.”
Out of scope (for now): Explicitly deprioritize (e.g. “No custom aliases in v1”).
Questions to ask:
“What’s in scope for this discussion—MVP or full product?”
“Who are the main users and what’s the most important flow?”
“Are there any features we should explicitly leave out?”
Non-functional requirements
Use a simple checklist so you don’t forget dimensions:
Performance
Latency (e.g. P99 < 200 ms), throughput (QPS), tail latency
Availability
Uptime target (e.g. 99.9%), planned maintenance, multi-region
Scalability
Growth (users, data, traffic), peak vs average (e.g. 3×)
Consistency
Strong vs eventual; read-after-write requirements
Durability
Can we lose data? RPO/RTO if applicable
Security
Auth, PII, compliance (GDPR, etc.)
Cost
Any rough budget or “optimize for cost” constraint?
Example (URL shortener):
“Redirect latency: P99 < 100 ms.”
“Availability: 99.99%.”
“We can accept eventual consistency for redirects; strong consistency for create.”
“URLs must not be lost once created.”
Output of this phase
Short list of must-have vs nice-to-have features.
Clear non-functional targets (latency, availability, scale, consistency).
Agreement on scope so you don’t over- or under-design.
Phase 2: Estimate Scale
Why it matters: Scale drives technology choices (single DB vs sharding, cache vs no cache, sync vs async). Show you think in numbers.
What to estimate
Traffic
DAU/MAU or requests per day/month.
Reads vs writes ratio (e.g. 100:1 for URL shortener).
Peak QPS (e.g. 3× average).
Storage
Record size and retention (e.g. 5 years).
Total size and growth rate.
Replication factor (e.g. 3×).
Bandwidth
In/out per request and total (optional for first pass).
How to present
State assumptions clearly: “Assume 100M new URLs per month, 100:1 read:write.”
Do simple math on the whiteboard:
Writes: 100M / (30 × 86,400) ≈ 40/s → peak ~120/s.
Reads: 40 × 100 = 4,000/s → peak ~12,000/s.
Round to one significant figure for discussion: “~100 writes/s, ~10K reads/s.”
Output
Read QPS and write QPS (or equivalent).
Total storage and storage per node if relevant.
These numbers justify caching, sharding, and replication later.
Phase 3: High-Level Architecture
Why it matters: This is the “picture” the rest of the interview builds on. Keep it simple first; add detail in deep dives.
Components to consider
Clients (web, mobile, API consumers).
Edge / CDN (static assets, sometimes redirects).
Load balancer(s).
API / application servers (stateless).
Caches (e.g. Redis).
Databases (primary + replicas, or sharded).
Message queues (async jobs, events).
External services (payments, notifications).
How to draw
Draw client → LB → app servers → cache → DB as a first cut.
Add queues and workers if you have async or background work.
Label read vs write path if they differ.
Add replicas or shards only after you’ve stated the need (e.g. “We’ll need read replicas for 10K reads/s”).
Data flow
Describe in one sentence: “User hits short URL → LB → app server → cache; on miss, DB → cache → redirect.”
Mention sync vs async: “Create short URL is synchronous; click analytics are sent to a queue and processed asynchronously.”
Output
A single diagram with 5–10 boxes and clear flow.
A one-paragraph narrative: “Traffic hits the LB, then stateless API servers. We cache hot URLs; on miss we hit the DB and backfill cache. Writes go to the primary; we’ll add read replicas for scale.”
Phase 4: Identify Bottlenecks
Why it matters: Shows you think about limits, not just “happy path.” Senior engineers anticipate failure modes.
Typical bottlenecks
Single DB
Writes and reads saturate one node
Sharding, read replicas
Cache
Stampede on miss, or cache down
Cache-aside + TTL, fallback to DB; consider single-flighter
API servers
CPU or memory under spike
Horizontal scaling, rate limiting
Message queue
Consumer lag, queue depth
More consumers, backpressure, DLQ
External API
Latency or rate limits
Timeouts, circuit breaker, cache, queue
How to present
“The main bottlenecks I see: (1) DB write capacity if we grow beyond one node, (2) cache stampede on a viral link, (3) DB as single point of failure.”
Then briefly: “I’d address (1) with sharding, (2) with TTL and maybe request coalescing, (3) with failover and replicas.”
Output
2–4 concrete bottlenecks and one-line mitigations. You’ll detail them in Phase 6–7.
Phase 5: Discuss Trade-offs
Why it matters: SDE-3 is expected to articulate why a design is chosen, not just what it is.
Common trade-offs
Consistency vs availability: CP vs AP; strong vs eventual.
Latency vs consistency: Synchronous replication vs async.
Cost vs performance: More cache vs more DB; more replicas vs lower durability.
Complexity vs flexibility: Monolith vs microservices; single DB vs polyglot persistence.
Operational complexity: Self-managed vs managed services (e.g. RDS vs self-hosted Postgres).
How to present
“For redirects we’ll use eventual consistency and cache heavily—low latency and high availability matter more than perfect freshness. For creating a short URL we’ll want strong consistency so we don’t hand out duplicates.”
“We could use 2PC for cross-service transactions, but that’s blocking and complex; I’d prefer a saga with compensating actions and idempotent steps.”
Output
2–3 explicit trade-offs with a clear “we choose X because Y” statement.
Phase 6: Deep Dive into Components
Why it matters: Interviewers want to see depth in at least one or two areas: API design, data model, or a specific component (cache, queue, DB).
What to prepare
API design
REST or RPC; idempotency for writes (e.g. idempotency key).
Key endpoints: create short URL, redirect, optional analytics.
Status codes and errors (rate limit, not found, conflict).
Data model
Main entities and relationships.
Schema (tables or documents): e.g.
short_code(PK),long_url,user_id,created_at.Indexes: by
short_code(lookup), byuser_id(list user’s URLs).Sharding key if you shard (e.g.
short_code).
One or two components in detail
Cache: Strategy (cache-aside), TTL, eviction (LRU), key format, stampede mitigation.
Database: Sharding strategy (hash vs range), replication (sync vs async), failover.
Queue: Use case (analytics, notifications), at-least-once vs exactly-once, consumer scaling.
How to run the deep dive
“I’ll go deeper on the cache and the database.”
For cache: “We use cache-aside. Key is
url:{short_code}. TTL 24 hours. On miss we load from DB and backfill. We use LRU when memory is full.”For DB: “We shard by
hash(short_code) % Nso lookups are single-shard. We use read replicas for read scaling; writes go to primary.”
Output
Concrete API (method, path, body, idempotency).
Schema + indexes + sharding key.
Clear behavior of 1–2 components (cache, DB, or queue).
Phase 7: Scaling and Failure Handling
Why it matters: Production systems scale and fail; you need to show you think about both.
Scaling
Horizontal: More app servers behind LB; more DB shards; more cache nodes; more consumers.
Vertical: Bigger DB/cache instances when it’s simpler (e.g. early stage).
Read scaling: Read replicas, cache, CDN.
Write scaling: Sharding, async writes (queue), batching.
State clearly: “We scale reads with replicas and cache; we scale writes by sharding when we outgrow one DB.”
Failure handling
DB primary down: Failover to replica; promote replica to primary; use replication.
Cache down: Fall back to DB; accept higher latency; optional stale cache from another region.
App server down: Stateless; LB stops sending traffic; no session state lost.
Queue backlog: Scale consumers; backpressure; DLQ for poison messages; alert on lag.
Degradation
“If the recommendation service is down, we show a default list instead of failing the page.”
“If DB is slow, we might serve stale from cache and surface a short delay message.”
Output
Scaling strategy in one or two sentences (read vs write, when to shard).
2–3 failure scenarios with clear mitigations (failover, fallback, degradation).
Quick Reference: What to Say When
Unclear scope
“For this exercise, should we focus on MVP or include analytics/custom aliases?”
Unclear scale
“What order of magnitude are we designing for—e.g. 1K vs 100K vs 1M DAU?”
After drawing HLD
“The main bottlenecks I see are … I’d address them by …”
Choosing technology
“I’d use X because … The trade-off is …”
Asked “what if X fails?”
“We’d … (failover / fallback / degrade). The impact would be …”
Running out of time
“If we had more time, I’d go deeper on … and add …”
Common Mistakes to Avoid
Designing before clarifying — Always do requirements and scale first.
Over-complicating the first diagram — Start with 5–7 boxes; add detail when asked.
Ignoring failure — Always mention at least one failure mode and mitigation.
No trade-offs — Explicitly state at least one “we choose A over B because …”.
No numbers — Do at least rough QPS and storage so your choices are justified.
Monologue — Ask 2–3 clarifying questions and confirm scope before drawing.
Quick Revision (Interview Day)
Clarify: Functional + non-functional; scope; 2–3 questions.
Estimate: QPS (read/write), storage, state assumptions.
HLD: Client → LB → app → cache → DB (+ queues if needed); one-sentence flow.
Bottlenecks: 2–4 with one-line mitigations.
Trade-offs: 2–3 “we choose X because Y.”
Deep dive: API + schema + 1–2 components (cache, DB, or queue).
Scaling: Horizontal, read vs write, when to shard.
Failure: Failover, fallback, graceful degradation.
Use this framework to drive the interview and to demonstrate structured, senior-level thinking.
Last updated