Intermediate
Internet Download Manager
An end-to-end Low Level Design (LLD) for an Internet Download Manager (IDM).
This problem tests your understanding of Concurrency (Multi-threading), Network Protocols (HTTP Range), and File I/O.
1. Requirements Clarification
Core Functionality: Download files from a URL.
Acceleration: Split the file into multiple parts (segments) and download them in parallel.
Resumability: Ability to pause and resume downloads (persisting state).
Assembly: Merge segments into the final file upon completion.
Status Tracking: Pending, Downloading, Paused, Completed, Failed.
2. Core Concept: HTTP Range Headers
The magic of IDM lies in the HTTP protocol.
Head Request: We first send a
HEADrequest to check the file size (Content-Length) and if the server supports parallel downloads (Accept-Ranges: bytes).Range Request: Workers send
GETrequests with specific byte ranges.Worker 1:
Range: bytes=0-1000Worker 2:
Range: bytes=1001-2000
3. Core Entities
DownloadTask: Represents the file being downloaded. Holds metadata (URL, file size, status).
Segment (Chunk): Represents a specific byte range (Start, End) and its download status.
Worker (Thread): A separate thread responsible for downloading one Segment.
DownloadManager: The Controller. Starts tasks, manages the thread pool, and handles file assembly.
4. Class Diagram

5. Python Implementation
A. The Segment and Task
These classes hold the data state.
Python
B. The Worker (Thread)
This performs the actual HTTP Request with the Range header.
Python
C. The Manager (Orchestrator)
Starts the download, waits for threads, and triggers merging.
Python
6. Key Design Decisions & Interview Talking Points
1. Handling Resume (Persistence)
Question: How does the system know where to resume after a crash?
Design: We must serialise the
DownloadTaskobject (including its list ofSegmentsand theirdownloadedbytes count) to a file (e.g.,.jsonor.idmfile) on disk.On Startup: The Manager reads this file, reconstructs the
Segmentobjects, and theChunkDownloadercalculates the new Range header:Range: bytes=(start + downloaded)-end.
2. File I/O Optimization
Design: I used separate temporary files (
.part0,.part1) for each thread.Why? This avoids the complexity of Locking. If all threads wrote to the same big file using
seek(), we would need a mutex to prevent race conditions, which slows down throughput. Merging at the end is faster than locking on every write.
3. Concurrency Model
Logic: I used a Thread per Segment.
Bottleneck: Too many threads can hurt performance due to Context Switching and disk I/O saturation.
Refinement: A real IDM uses a bounded
ThreadPool(e.g., max 8 threads) regardless of how many files are in the queue.
4. Network Failure Handling
Logic: If a specific chunk fails, only that specific
Segmentis marked asFAILED. The Manager has a retry policy (Exponential Backoff) to retry only the failed chunks, not the whole file.
Would you like to explore "Design a URL Shortener" (System Design/HLD) or stick to LLD with "Design an Elevator System"?
Coupon System for Zepto
An end-to-end Low Level Design (LLD) for a Coupon & Discount System (similar to Zepto, Swiggy, or Amazon).
This problem focuses on Flexibility and the Strategy Design Pattern. You need a system that can handle "10% Off", "Flat $50", "BOGO", and "Free Shipping" without rewriting code every time a marketing team invents a new promo.
1. Requirements Clarification
Coupon Types:
Percentage: "Get 10% off up to $50".
Flat: "Get $100 off".
Condition-based: "Free delivery on orders over $500".
Constraints (Rules):
Minimum Order Value: E.g., Cart > $200.
Category Specific: Only applies to "Electronics".
User Specific: "New Users Only".
Expiry: Valid between Date A and Date B.
Behavior:
Validate if a coupon is applicable to the current cart.
Calculate the discount amount.
2. Core Entities & Patterns
Cart: The context object containing items and total value.
Coupon: The core entity. It doesn't hardcode logic; instead, it holds Strategies.
Constraint (Interface): Defines conditions (e.g.,
isValid(cart)). Uses Chain of Responsibility or a simple List validation.Reward (Interface): Defines calculation (e.g.,
calculateDiscount(cart)). Uses Strategy Pattern.
3. Class Diagram
The key takeaway is separating the "Condition" (Can I use it?) from the "Reward" (What do I get?).

4. Python Implementation
A. The Context (Cart & Item)
Simple data holders.
Python
B. The Constraints (Validation Strategies)
We define small, reusable rules.
Python
C. The Rewards (Calculation Strategies)
This handles the math. Note the logic in PercentageReward to handle maximum caps.
Python
D. The Coupon (The Composite Entity)
The Coupon itself is lightweight. It simply iterates through its constraints and then delegates calculation to its reward.
Python
E. Orchestration (Driver Code)
Python
5. Key Design Decisions & Interview Talking Points
1. Strategy Pattern vs Inheritance
Bad Approach: Creating classes like
FlatCoupon,PercentageCoupon,MinOrderFlatCoupon. This leads to a class explosion.Good Approach (This Design): Composition. A
Couponhas a list of Constraints and has a Reward strategy. This allows dynamic combinations (e.g., a "Flat Discount" valid only for "Electronics" on "Weekends").
2. Extensibility
Question: "How would you add a 'Buy One Get One' (BOGO) coupon?"
Answer: I would create a new Reward Strategy:
BogoReward.It implements
calculate(cart).Logic: Scan items, find pairs, deduct the price of the cheaper item.
No changes needed in
CouponorCartclasses.
3. Concurrency (Database Design HLD hint)
If this is a "First 100 users only" coupon, the
Constraintneeds to check a global counter. This introduces concurrency issues.Solution: The
GlobalLimitConstraintwould need to interact with a Redis atomic counter (decrement) to ensure we don't oversell the coupon.
4. Multiple Coupons
Question: Can I stack coupons?
Answer: The current design allows testing coupons individually. To support stacking, the
CouponManagerwould take a list of coupons, apply the first one, update the cart's "effective total", and apply the second one.
Would you like to explore "Design a Payment Gateway Integration" (Handling states, retries, and idempotency) next?
Android Unlock Pattern
An end-to-end Low Level Design (LLD) for the Android Unlock Pattern system.
This problem is a classic example of Graph Theory (DFS/Backtracking) applied to a UI interaction problem. It tests how you model constraints, validate user input, and handle state transitions.
1. Requirements Clarification
Grid: Standard grid (9 dots).
Input: A sequence of connected points (the "Pattern").
Constraints:
Minimum Length: Pattern must connect at least 4 dots.
Unique Nodes: A dot can be visited only once in a single pattern (unless passing through an already visited dot, though standard implementations usually strictly forbid re-visiting as a "node endpoint").
Direct Path: If connecting Dot A to Dot C passes through Dot B, Dot B must be part of the pattern (unless B was already visited).
Operations:
Set Pattern: Save a new valid pattern.
Verify Pattern: Check if input matches the saved pattern.
2. Core Entities
Point (Node): Represents a specific dot on the grid or ID .
Pattern: A strictly ordered list of Points.
GridManager: Knows the geometry. Validates if a move from is legal (e.g., is it adjacent? Does it skip a node?).
UnlockService: The controller. Stores the "password" (hashed) and manages the
SetvsVerifystate.
3. Class Diagram

4. Python Implementation
A. The Point Entity
We map the 3x3 grid to coordinates to .
Python
B. The Validator (The "Skip" Logic)
This is the hardest part.
Scenario: Moving from to skips .
Rule: This is only legal if has already been visited.
Math: The midpoint between two points and is an integer coordinate only if and are both even.
Python
C. The Lock System (Controller)
Python
5. Key Interview Points
1. The "Skip" Logic (Midpoint Formula)
Question: How do you detect if
0 -> 8skips4?Answer:
.
Sum of coords: .
Both are even, so Midpoint exists at , which is Index 4.
If Index 4 is NOT in
visited_set, the move is illegal.
Edge Case: Knight's move (like Chess).
.
Sum: . Row sum is odd. No integer midpoint.
Move is valid without any visited checks (knights jump over pieces).
2. Security (Hashing)
Never store the pattern as a plain list
[0, 1, 2, 5].Convert the sequence to a string or byte array and SHA-256 hash it.
Salt: Add a user-specific salt to prevent Rainbow Table attacks (though the search space for patterns is small, ~389,000 combinations, so salting is crucial).
3. Maximum Combinations (DFS)
If the interviewer asks "How many valid patterns exist?", this becomes a DFS / Backtracking counting problem.
Min length 4, Max length 9.
You write a recursive DFS that attempts every valid move from the current node and counts the paths.
4. Code Extensibility
Grid Size: The code calculates
row = index // 3. By changing3toN, thePointandValidatorlogic supports 4x4 or 5x5 grids instantly.
Would you like to move to "Design a Code Deployment System" (like Jenkins/deployment pipelines)?
Scalable Database Schema for Reddit-like Comments
An end-to-end Low Level Design (LLD) and Schema Design for a scalable, threaded comment system like Reddit.
This is a classic system design interview question that tests your knowledge of Hierarchical Data Modeling in relational databases.
1. Requirements Clarification
Structure: Infinite nesting (Comment A Reply B Reply C).
Scale: Millions of comments per post (potentially).
Performance:
Read-heavy: Users view comments much more than they write.
Pagination: Cannot load the entire tree at once; need "Load more replies".
Ranking: Comments are sorted by relevance (score) or time.
Operations: Post comment, Reply, Upvote/Downvote, Edit.
2. The Core Problem: Storing Trees in SQL
There are four common ways to store trees. For Reddit, we compare the two best options:
Strategy
Concept
Pros
Cons
Verdict
Adjacency List
Column parent_id
Simple, easy inserts.
Reading a full tree requires recursive queries (CTEs).
Good (Standard)
Materialized Path
Column path (e.g., "1/5/12")
Extremely fast subtree retrieval via regex/prefix.
Moving subtrees is hard (rare in Reddit).
Best for Scale
We will use the Materialized Path (Path Enumeration) pattern because it makes fetching and sorting nested threads efficient without complex recursion.
3. Database Schema
Table 1: Comments
CommentsThis is the heart of the system.
SQL
Table 2: CommentVotes
CommentVotesKeeps track of who voted on what to prevent double voting.
SQL
4. Key Operations (SQL Logic)
A. Inserting a Comment
When User B replies to User A (Comment ID 100, Path 0010/0100):
Generate Path:
Parent_Path+/+New_ID.Calculate Depth:
Parent_Depth+ 1.
Note: In a real system, you might generate the ID separately or use a UUID to construct the path before insertion.
SQL
B. Fetching the Tree (The "Reddit View")
This is where Materialized Path shines. If you sort by the path column string, the comments naturally align in a Depth-First Search (DFS) order—exactly how they appear on a UI.
SQL
Result:
0001(Root Comment)0001/0002(Reply to 1)0001/0002/0003(Reply to 2)0005(New Root Comment)
Pagination: To "Load More" from a specific point, you can simply query WHERE path > 'last_seen_path'.
5. Python LLD Implementation

Code Structure
Python
6. Optimization & Scaling (Interview Gold)
1. The "Viral Post" Problem (Hot Partition)
Issue: If one post has 1 million comments, querying
WHERE post_id = Xbecomes slow.Solution: Database Sharding. Shard comments based on
post_id. All comments for Post X live on Shard A. This keeps the tree structure intact on one server.
2. Denormalization (Vote Counts)
Issue: Calculating
SELECT COUNT(*) FROM Votes WHERE comment_id = Xevery time a user views a thread is fatal for performance.Solution: Add
upvote_countcolumn toCommentstable.Write Path: When a user votes, update the
Votestable AND increment theComments.upvote_count.High Scale: Use Redis to buffer increments. Update the SQL database in batches (e.g., every 10 seconds) to reduce write locking.
3. Caching (Redis)
Cache the Top 200 comments of hot threads in Redis.
Structure:
Redis ListorSorted Setserialized as JSON.Since the top comments don't change structure often (mostly just vote counts change), cache invalidation is manageable.
4. Why not NoSQL (MongoDB/Cassandra)?
MongoDB: Actually a good fit. You can store the comment tree as a nested document (up to 16MB limit) or use the same "Materialized Path" pattern.
Cassandra: Great for writes, but strictly hierarchical fetching (Sorting) is harder. You would need to model the partition key carefully (e.g.,
post_id) and use clustering keys for sorting, but recursive nesting depth is hard to model without application-side logic.Verdict: SQL (Postgres) is usually preferred for comments because of strong consistency (ACID) requirements for voting and structural integrity, but MongoDB is a very strong runner-up.
Next Step
Would you like to move on to "Design a Leaderboard System" (Redis Sorted Sets) or "Design a Notification Service"?
Sublime Text IDE
An end-to-end Low Level Design (LLD) for a Text Editor (like Sublime Text, Notepad++, or VS Code).
This problem is unique because it focuses heavily on Data Structures (how to store text efficiently) and the Command Pattern (how to handle Undo/Redo).
1. Requirements Clarification
Core Actions: Insert text, Delete text, Move cursor (Up/Down/Left/Right).
File Handling: Open, Save.
Undo/Redo: Infinite undo history.
Selection: Highlight and modify a range of text.
Multiple Views: Support editing multiple files (Tabs).
Search: Find strings within the file.
2. The Core Data Structure (The "Secret Sauce")
How do we store the text? A simple String is for insertions. We need better options:
Gap Buffer (Used by Emacs): A dynamic array with a "gap" of empty space at the cursor. Insertions at the cursor are . Moving the cursor far away requires shifting the gap ().
Rope (Used by Heavyweights): A binary tree where leaves are strings. Efficient for massive files.
Piece Table (Used by VS Code): Stores the original file (Read-Only) and an "Add Buffer" (Append-Only). The document is a list of pointers to these buffers.
List of Lines (Our Choice): A Linked List or Array of Strings, where each string is a line.
Why? It's easy to implement in an interview and efficient enough for most cases (vertical movement is ).
3. Core Entities & Patterns
TextBuffer: The Model. Holds the content (List of Lines).
Cursor: Tracks position (
row,col).Command (Pattern): Encapsulates an action (
InsertCommand,DeleteCommand). This is mandatory for Undo/Redo.CommandHistory: A Stack that holds executed commands.
Editor: The Controller. Binds inputs to commands.
Flyweight (Pattern): (Optional) Used for rendering characters with syntax highlighting to save memory.
4. Class Diagram

5. Python Implementation
A. The Model (Buffer & Cursor)
We use a List of Lines approach. This makes handling "Enter" (splitting a line) and "Backspace" (merging lines) the critical logic.
Python
B. The Command Pattern (Undo/Redo)
Every action must be reversible.
Python
C. The History Manager
Manages the stacks.
Python
D. The Editor (Controller)
Python
6. Key Interview Points
1. Data Structures (The "Gap Buffer")
Question: Why not use a simple long string?
Answer: Inserting into a string is (requires shifting all subsequent characters).
Optimization: In a real-world editor (Emacs), we use a Gap Buffer.
[H, E, L, L, O, _, _, _, W, O, R, L, D]The underscores represent empty space (the gap) at the cursor position. Insertion is .
If the cursor moves, we shift the gap.
2. Syntax Highlighting (Flyweight Pattern)
If we have 1 million 'A' characters, creating 1 million objects is heavy.
Flyweight: We create one
Glyph('A', Font, Color)object and reuse it, storing only the position externally.Strategy Pattern: Syntax highlighting rules (Python vs Java) are different strategies injected into the parser.
3. Large File Support (Proxy/Lazy Loading)
Question: How to open a 10GB log file?
Answer: Do not load the whole file into RAM.
Use Memory Mapping (mmap).
Load only the "viewport" (visible lines) + a small buffer above/below.
This is the Virtual Proxy pattern.
4. Search (KMP Algorithm)
Implementing
Ctrl+Frequires string matching. A naive loop is . The Knuth-Morris-Pratt (KMP) algorithm does it in , which is preferred for text editors.
5. Composite Pattern
Used if the editor supports grouping elements, like folding sections of code (blocks) or rendering rich text where a Document contains Paragraphs, which contain Sentences, which contain Characters.
Would you like to move on to "Design a Scheduler / Job Queue" (Concurrency & Heaps) or "Design a Cache (Redis)"?
Ngrok Tool
Here is a detailed Low Level Design (LLD) for an Ngrok-like Tunneling Tool.
This is a System Design / Networking heavy problem. The core challenge is exposing a service running behind a NAT/Firewall (localhost) to the public internet without port forwarding.
1. Requirements Clarification
Goal: Expose a local port (e.g.,
localhost:8080) to a public URL (e.g.,myapp.tunnel.com).Actors:
Public User: Accesses the URL via browser.
Tunnel Server (Cloud): Accepts public traffic and routes it to the correct agent.
Tunnel Client (Agent): Runs locally, connects to the server, and forwards traffic to the local app.
Key Features:
NAT Traversal: The client must initiate the connection (outbound) because inbound ports are blocked.
Multiplexing: Multiple public users should use the same single persistent TCP connection to the agent.
Protocol: HTTP/TCP.
2. High-Level Architecture
The magic of Ngrok is a Reverse Tunnel.
Agent connects to Server (on a control port, e.g., 4443). Connection is kept open.
Server listens on a public port (e.g., 80) for internet traffic.
User requests
http://abc.tunnel.com.Server looks up the connection for
abc, encapsulates the HTTP request, and sends it down the socket to Agent.Agent unwraps it, calls
localhost:8080, gets response, and sends it back up the tunnel.
3. Class Diagram & Entities

4. Python Implementation (Machine Coding)
We will use Python's asyncio because this is an I/O-bound problem requiring high concurrency.
A. The Tunnel Server (server.py)
server.py)This runs on the Cloud. It acts as a switchboard.
Python
B. The Tunnel Client (agent.py)
agent.py)This runs on your laptop. It connects Server <-> Localhost.
Python
5. Key Design Challenges (Interview Gold)
1. Handling Concurrency (Multiplexing)
Problem: In the code above, if the Agent is processing one request (waiting for localhost), the tunnel is blocked. Other users hang.
Solution (Real Ngrok): The tunnel shouldn't just send raw bytes. It should wrap data in Frames.
Frame:
[RequestID | Payload_Length | Payload]The Agent reads the frame, spawns a new async task to talk to localhost, and when done, sends a frame back with the same
RequestID.This allows asynchronous multiplexing over a single TCP socket.
2. Disconnections & Reconnecting
Problem: Wi-Fi drops.
Solution: The Agent needs a
while Trueloop with exponential backoff retry logic around theconnect()function. It should also sendPINGframes (Heartbeats) to keep the NAT mapping alive.
3. Security
Problem: Anyone can connect to your server and register "https://www.google.com/search?q=google.tunnel.com".
Solution:
Auth Tokens: Agent sends
REGISTER: {token} : {subdomain}.TLS Termination: The Public Server should handle HTTPS certificates (wildcard cert
*.tunnel.com), decrypt the traffic, and send plain HTTP (or re-encrypted) over the tunnel.
4. Protocol Support
Ngrok supports HTTP, TCP, and TLS tunnels.
For Raw TCP (e.g., exposing an SSH port), the server cannot parse headers. It simply allocates a random public port (e.g.,
tunnel.com:12345) and pipes everything from that port to the agent.
6. Execution Flow Summary
Start Local App:
python -m http.server 5000(runs a dummy site).Start Tunnel Server:
python server.py. It listens on 9090 (control) and 8080 (web).Start Agent:
python agent.py. It connects to 9090 and says "I am demo".Public Access: Browser opens
http://localhost:8080.Server sees "Host: localhost". Routes to 'demo' Agent.
Agent sends to port 5000.
Port 5000 replies.
Agent sends back to Server.
Browser shows the page.
Would you like to explore "Design a Load Balancer" or "Design a Collaborative Document Editor (Operational Transformation)" next?
Rate Limiter
An end-to-end Low Level Design (LLD) for a Rate Limiter.
This is a classic Machine Coding problem that tests your knowledge of Concurrency, Design Patterns (Strategy), and Algorithms.
1. Requirements Clarification
Goal: Restrict the number of requests a client (User/IP) can make within a time window.
Rules:
Limit: e.g., "10 requests per second".
Granularity: User-based (User ID) or Global.
Behavior: Return
True(Allowed) orFalse(Throttled/HTTP 429).Environment:
Thread-Safe: Must handle concurrent requests correctly.
In-Memory: For this problem, we assume a single server instance (though we should discuss distributed scaling).
2. Algorithm Choice
There are two primary algorithms suitable for an LLD interview. We must choose one and justify it.
Sliding Window Log: Stores the timestamp of every request. Accurate, but memory-intensive ( space per user).
Token Bucket (Recommended): A "bucket" holds tokens. Tokens refill at a constant rate. Each request consumes a token.
Pros: Memory efficient ( space—just an integer counter), allows short bursts of traffic (good UX), easy to implement.
We will implement the Token Bucket Algorithm.
3. Class Diagram
We use the Strategy Pattern to allow swapping algorithms (e.g., switching from Token Bucket to Leaky Bucket) without changing the core service logic.
Code snippet

4. Python Implementation
A. The Bucket Entity (Thread-Safe)
This class holds the state for a single user.
Crucial Concept: We use Lazy Refill. We don't have a background timer adding tokens every second. Instead, when a request comes in, we calculate: tokens_to_add = (now - last_time) * rate.
Python
B. The Strategy (Manager)
This manages the collection of buckets for different users.
Python
C. The Service (Facade) & Driver Code
Python
5. Key Design Decisions & Interview Q&A
1. Handling Concurrency
Question: Why did you use locks inside the
Bucketclass?Answer: Granularity.
If I locked the entire
TokenBucketLimiterclass (the dictionary), User A's request would block User B's request.By locking only the
Bucketinstance, User A only contends with other requests from User A. This allows high throughput for the system overall.
2. Token Bucket vs Leaky Bucket
Question: Why Token Bucket?
Answer: Token Bucket allows Bursts. In real-world web usage, a user loading a page might fire 10 simultaneous API calls. Token Bucket allows this (up to capacity). Leaky Bucket enforces a rigid constant rate (queueing or dropping burst requests), which can degrade UX for innocent bursts.
3. Distributed Rate Limiting (Scaling Up)
Question: How does this change if we have 5 servers behind a Load Balancer?
Answer: In-memory dictionaries won't work (requests are split across servers).
We move the state to a centralized store like Redis.
We use Lua Scripts in Redis to perform the "Check -> Decrement" operation atomically.
Algorithm:
redis.eval('local curr = redis.call("get", key); if curr > 0 then ...').
4. Memory Cleanup
Question:
self.bucketsgrows forever. How do you fix memory leaks?Answer: We need a cleanup strategy.
TTL (Time To Live): If using Redis, keys expire automatically.
Background Worker: A thread runs every few minutes to delete
Bucketobjects wherelast_refill_timestampis older than 1 hour.LRU Cache: Implement
self.bucketsas an LRU Cache (e.g., Pythonfunctools.lru_cacheorOrderedDict) with a max size.
Would you like to move on to "Design a Splitwise (Expense Sharing App)" or "Design a Notification System"?
Thread Pool
An end-to-end Low Level Design (LLD) for a Custom Thread Pool.
This is a fundamental concurrency problem. It tests your understanding of OS Concepts (Threads vs. Processes), Synchronization, and the Producer-Consumer Pattern.
1. Requirements Clarification
Goal: Create a container that manages a fixed number of threads to execute tasks concurrently.
Why? Creating and destroying threads is expensive (OS overhead). A pool reuses threads.
Core Features:
Fixed Size:
nthreads running in the background.Task Queue: A buffer to hold tasks when all threads are busy.
Non-blocking Submission:
submit(task)should return immediately, not wait for execution.Shutdown: Ability to stop accepting new tasks and shut down workers gracefully.
2. Core Architecture: Producer-Consumer
The Thread Pool is a classic implementation of the Producer-Consumer pattern.
Producers: The main program calling
pool.submit(task).Buffer: A thread-safe Blocking Queue.
Consumers: The "Worker Threads" that loop infinitely, pulling tasks from the queue.
3. Class Diagram
Code snippet

4. Python Implementation
We will use Python's threading and queue modules. The queue.Queue class is already thread-safe (it uses locks/semaphores internally), which simplifies the code significantly.
A. The Worker (The Consumer)
This is a wrapper around a system thread. It runs an infinite loop.
Python
B. The Thread Pool (The Manager)
This manages the lifecycle of the workers.
Python
C. Driver Code
Python
5. Key Design Decisions & Interview Talking Points
1. The "Poison Pill" (Graceful Shutdown)
Problem: How do you kill a thread that is blocked on
queue.get()? You can't force-kill a thread safely in most languages (it might leave resources/locks open).Solution: Insert a special sentinel value (like
Noneor aStopSignalobject) into the queue.Logic: When a worker pulls this value, it realizes "The Manager wants me to stop," breaks its loop, and exits cleanly. You must push exactly pills for threads.
2. Blocking Queue
Why: We use a blocking queue so that if there are no tasks, the worker threads go to sleep (Waiting State).
Benefit: This saves CPU. If we used a standard list and a
whileloop, the threads would "spin" (busy wait), consuming 100% CPU checking for new work.
3. Future / Promise (Returning Results)
The implementation above is "Fire and Forget" (void return).
Interview Question: "How would you return a value?"
Answer: The
submit()method should return aFutureobject immediately.The
Futurehas a.get()method that blocks.The
Workercalculates the result and sets it inside theFuture.Any thread waiting on
.get()is then notified (using a Condition Variable).
4. Ideal Pool Size
CPU-Bound Tasks: Size = Number of Cores (e.g., 4 or 8). Adding more threads just increases context-switching overhead without performance gain.
I/O-Bound Tasks (Web scraping, DB calls): Size can be much larger (e.g., 50 or 100). Threads spend most of their time waiting for network responses, so one core can manage many waiting threads efficiently.
5. Rejection Policy
Question: What if the queue gets full? (If
maxsizewas set).Answer: We need a Rejection Policy.
Abort: Throw an exception.
CallerRuns: The main thread (producer) runs the task itself (slowing down submission).
Discard: Throw away the oldest or newest task.
Would you like to move on to "Design a Pub-Sub System (like Kafka)" or "Design a Splitwise"?
OYO/Airbnb - Part 1: Database Modelling
An end-to-end Low Level Design (LLD) for a Hotel Booking System (like OYO or Airbnb).
This problem focuses heavily on Data Modeling, Date Handling, and most importantly, Concurrency Control (preventing double bookings).
1. Requirements Clarification
Actors:
Guest: Searches for hotels, views details, books rooms.
Host/Admin: Adds hotels, manages rooms/inventory.
System: Manages bookings, payments, and notifications.
Core Use Cases:
Search: Find hotels by City + Date Range.
Book: Reserve a specific room type for specific dates.
Cancel: Cancel a booking and free up inventory.
Critical Constraint:
No Double Bookings: Two users cannot book the same room for the same date.
Scalability: Search should be fast.
2. Core Entities & Relationships
Hotel: The physical property (Name, Address, Rating).
Room: A specific unit (Room 101) or a Category (Deluxe). For Airbnb, it's usually a specific unit. For OYO, it's often Categories. We will design for Room Instances (e.g., Room 101, 102) to make availability logic clear.
Booking: Connects a User, a Room, and a Date Range.
InventoryManager: The engine that checks if dates are free.
3. Class Diagram
Code snippet

4. Python Implementation
A. Basic Models (Hotel & Room)
The core logic for "Is this room free?" lives inside the Room class.
Python
B. The Booking Manager (The Controller)
This handles the high-level logic and the critical "Check-Then-Act" race condition.
Python
C. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Handling Concurrency (Double Booking)
Problem: User A and User B check availability. Both see "Free". Both pay. Both Book.
LLD Solution (Single Machine):
synchronizedkeyword (Java) orLock(Python) on the specificRoomobject.System Design Solution (Distributed):
Optimistic Locking: Add a
versioncolumn to the Room table. When updating, checkver = old_ver. If it fails, retry.Pessimistic Locking (Better for bookings): Use SQL
SELECT ... FOR UPDATEwhen reading the room row. This locks the row until the transaction completes.
2. Search Optimization
Problem: Iterating through all hotels to check availability (
is_available) is slow ().Solution: Invert the index. Maintain an Availability Table.
Table:
RoomAvailability (room_id, date, status).Query:
SELECT room_id FROM RoomAvailability WHERE date IN (d1, d2) AND status = 'FREE'.Even better: Use Elasticsearch or Geospatial Indexes (QuadTrees) to filter by Location first, then filter by Availability.
3. Dynamic Pricing
Add a
PricingStrategyinterface (Strategy Pattern).BasePriceStrategyDemandSurgeStrategy(if booked % > 80, price * 1.5)SeasonalStrategy(if month == Dec, price * 2)
4. Scalability (Hotels vs Rooms)
Airbnb lists unique homes (Room = Hotel essentially).
OYO lists Hotels with types.
The design above uses
Roomas a unique unit (Room 101). If OYO has 100 "Standard Rooms" identical to each other, we can model this asRoomTypeInventorywith a count (e.g.,count=100) instead of 100 row objects. This saves DB space.
Would you like to continue with "Design a Parking Lot" (classic) or "Design a Tic-Tac-Toe Game"?
Google Calendar Database Model
An end-to-end Low Level Design (LLD) for a Calendar System (like Google Calendar or Outlook).
This problem is a test of Data Modeling (how to represent time) and Algorithms (how to efficiently detect overlaps and manage recurring events).
1. Requirements Clarification
Core Entities: Users, Events, Calendars.
Event Properties: Title, Start Time, End Time, Location, Attendees.
Core Operations:
Create, Update, Delete events.
Get Events: View a day/week/month.
Conflict Detection: Check if a user is free during a specific time slot.
Advanced Features (Bonus):
Recurring Events: (e.g., "Every Monday").
Invites: Adding other users to an event.
Reminders: Notification logic.
2. Core Concept: Handling Time
Time is tricky. To ensure the system works globally:
Storage: Always store time in UTC (Epoch timestamps).
Display: Convert to the User's Local Timezone only at the UI layer.
Intervals: We need a robust class to handle time ranges and overlap logic.
3. Class Diagram
We will use a Service-Oriented Architecture. The CalendarService acts as the facade.
Code snippet

4. Python Implementation
A. The TimeSlot (Interval Logic)
This helper class encapsulates the math for checking if two events clash.
Python
B. The Event & User
Basic entities.
Python
C. The Calendar
Represents one user's schedule. Note the check_conflict method.
Python
D. The Service (Controller)
Handles the logic of inviting people (adding events to multiple calendars).
Python
E. Driver Code
Python
5. Key Interview Design Challenges
1. Recurring Events ("Every Monday")
Naive Approach: Create 52 separate event objects for the next year.
Problem: Waste of space. If user changes the title, you have to update 52 objects.
Smart Approach (RRULE): Store the event once with a Recurrence Rule (RFC 5545 standard).
Example Rule:
FREQ=WEEKLY;BYDAY=MORetrieval: When
get_events(Jan 1, Jan 31)is called, the system dynamically calculates the "Virtual Events" generated by the rule that fall in that window.
2. Timezone Management
Problem: Alice (New York) creates an event at 9 AM EST. Bob (London) views it.
Solution: Store
9 AM ESTas14:00 UTCin the database. When Bob views it, convert14:00 UTC+0 hours(London/GMT) =2 PM.
3. Efficient Search (Data Structures)
Problem:
get_eventsusing a simple list is .Solution: Use an Interval Tree or Segment Tree.
This allows finding all intervals overlapping a specific range in time (where K is the number of results).
In a relational DB (SQL), we use:
SQL
(Composite index on
start_timehelps, but interval queries are notoriously hard to optimize perfectly in B-Trees).
4. Meeting Scheduler (Find Free Slots)
Feature: "Find a time where Alice, Bob, and Charlie are all free."
Algorithm:
Fetch sorted event intervals for all 3 users.
Merge intervals to find "Busy Blocks" (like the "Merge Intervals" LeetCode problem).
Invert the "Busy Blocks" to find "Free Blocks".
Return blocks where
duration >= requested_duration.
Would you like to explore "Design an Elevator System" (State Machine) or "Design a Chat Application (WhatsApp)"?
Google Authenticator
An end-to-end Low Level Design (LLD) for a Google Authenticator system.
This is a unique Machine Coding problem because it is less about "Architecture" and almost entirely about Cryptography and the RFC 6238 (TOTP) standard. The "magic" is that the Client (App) and Server generate the same 6-digit code without talking to each other.
1. Requirements Clarification
Goal: Generate and Validate 2-Factor Authentication (2FA) codes.
Core Mechanism: TOTP (Time-based One-Time Password).
Setup: The server generates a Secret Key, shares it with the User (usually via QR Code), and both save it.
Operation:
App: Uses
Secret + Current TimeGenerates 6-digit code.Server: Uses
Stored Secret + Current TimeGenerates code Compares with User Input.
Constraints:
Code changes every 30 seconds.
Must handle slight time differences (clock drift) between phone and server.
2. The Algorithm (The "Secret Sauce")
You cannot design this without knowing the math. Google Authenticator uses HMAC-SHA1.
Time Step (): . This ensures the input only changes every 30 seconds.
Hashing: Calculate
HMAC-SHA1(Secret_Key, T). This gives a 20-byte hash.Dynamic Truncation:
Take the last byte of the hash.
Extract the last 4 bits to get an
offset(0 to 15).Take 4 bytes from the hash starting at
offset.
Modulus: Convert those 4 bytes to an integer, remove the sign bit, and do
% 1,000,000to get the 6 digits.
3. Class Diagram
Code snippet

4. Python Implementation
We need hmac, struct (for binary manipulation), and base64 (Google Auth uses Base32 encoding for secrets).
A. The Math Core (RFC 6238)
Python
B. The Service (Server-Side Logic)
This handles User management and the "Window Verification" (Clock Drift).
Python
C. Driver Code (Simulation)
Python
5. Key Design Details (Interview Gold)
1. Why Base32?
You will be asked: "Why not Base64?"
Answer: Base32 only uses
A-Zand2-7. It avoids easily confused characters (like1,I,lor0,O) which is crucial if a user has to manually type the secret key (Backup Key) instead of scanning a QR code. It is also case-insensitive.
2. Clock Drift (Synchronization)
Problem: The user's phone is in Airplane mode and the clock drifted by 1 minute.
Solution: The
verify_codeloop (seen in the code above). Standard servers check a window of[-1, 0, +1]intervals. If the user is successfully verified at window-1, the server can store this "drift" and apply it to future requests for that user to improve UX.
3. Replay Attacks
Problem: An attacker intercepts a valid code and uses it immediately.
Solution: The server should store
last_used_intervalfor the user. Ifcurrent_interval == last_used_interval, reject the request. A TOTP code should be single-use.
4. QR Code Format
The "QR Code" is simply a URI scheme:
otpauth://totp/Google:Alice?secret=JBSWY3DPEHPK3PXP&issuer=Google
Secret: The Base32 key.
Issuer: Who is asking for the code (e.g., Gmail).
5. Backup Codes
If the user loses the phone, they lose the Secret.
Design: When generating the secret, also generate 10 random 8-digit "Scratch Codes", store them in the DB (hashed), and display them to the user once. These are one-time use overrides.
Next Step
Would you like to solve "Design a URL Shortener (TinyURL)" or move to a complex game design like "Design Chess"?
Amazon Prime Video
An end-to-end Low Level Design (LLD) for a Video Streaming Service (Amazon Prime Video / Netflix).
This problem tests your ability to model Hierarchical Data (Series Season Episode), manage Access Control (Prime vs. Rent), and understand how modern video delivery works (Adaptive Bitrate Streaming).
1. Requirements Clarification
Content Types: Movies (Standalone) and TV Shows (Nested: Show Seasons Episodes).
Playback: Users select a resolution (or Auto) and stream video.
Entitlement (Monetization):
SVOD (Subscription): Free for Prime members.
TVOD (Transactional): Rent or Buy specific movies.
User Experience:
Watch History: Resume playback from where you left off.
Search: Find content by title/genre.
2. Core Concept: Adaptive Streaming (HLS/DASH)
In a real system, you do not stream one big MP4 file. You stream a Manifest File (.m3u8) which points to thousands of tiny chunks (.ts files) of video.
The LLD Implication: Your
Videoobject doesn't holdbyte[] data. It holds aMap<Resolution, URL>pointing to a CDN (Content Delivery Network).
3. Class Diagram
We use the Composite Pattern concept for the Movie/Series hierarchy and the Strategy Pattern (implicit) for Entitlement checks.
Code snippet

4. Python Implementation
A. The Data Models (Content Hierarchy)
We need a flexible structure. Movie and Episode are "Playable", but TVShow is just a container.
Python
B. User & Watch History
Python
C. The Service (Controller)
This handles the critical "Can I watch this?" logic and URL fetching.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Handling "Resume Watching" at Scale
Problem: If 10 million people pause every second, writing to the DB (
UPDATE history SET time=...) is too heavy.Solution: Write-Back Cache (Redis).
The frontend sends a "heartbeat" every 10 seconds with current progress.
Server updates Redis:
SET user:video:timestamp.A background worker flushes Redis to the permanent DB (Cassandra/DynamoDB) every few minutes or on session close.
2. Search (Elasticsearch)
You cannot search efficiently in a standard SQL DB with queries like "Action movies with Tom Cruise".
Use an Inverted Index (Elasticsearch/Solr).
Document:
{ "id": "mv_001", "title": "Mission Impossible", "cast": ["Tom Cruise"], "genre": "Action" }Search is fuzzy and extremely fast.
3. CDN & Geo-Location
Question: Why does the video load fast in India but the server is in the US?
Answer: The
get_stream_urlmethod doesn't return a file on the app server. It returns a URL to a CDN (like CloudFront or Akamai) edge server. The CDN replicates the video file to a server physically close to the user (e.g., Mumbai).
4. DRM (Digital Rights Management)
To prevent users from downloading the
.m3u8stream and converting it to MP4, we use Encrypted Streams (Widevine, FairPlay).The Player needs a License Key to decrypt the video chunks. The
EntitlementServiceissues this token only if the user has rights.
5. Recommendation System
This is usually a separate Microservice ("Personalization Service").
It consumes the
WatchHistoryevents via a message queue (Kafka) and updates a User-Item Matrix to generate suggestions (Collaborative Filtering).
Would you like to try "Design a Food Delivery App (Swiggy/Zomato)" or "Design a Payment Gateway" next?
Online Book Management System
An end-to-end Low Level Design (LLD) for an Online Book Management System (Library Management System).
This is a classic Object-Oriented Design (OOD) question. The main challenge is distinguishing between the Metadata (The abstract concept of the book "Harry Potter") and the Inventory (The 5 physical copies of "Harry Potter" on the shelf).
1. Requirements Clarification
Actors:
Librarian (Admin): Add/Remove books, Block members.
Member: Search, Borrow, Return, Reserve books.
System: Notifications, Fine calculation.
Core Use Cases:
Search: By Title, Author, or Category.
Check-out (Borrow): Validate eligibility (max limit, unpaid fines), decrement inventory.
Return: Update inventory, calculate overdue fines.
Constraints:
Each book can have multiple copies (Book Items).
Max 5 books per member.
Max lending duration: 10 days.
2. Core Entities & Relationships
Book (Metadata): Details like ISBN, Title, Author. Shared by all copies.
BookItem (Physical Copy): Specific instance with a unique Barcode and Status (Available, Loaned, Lost).
Account/Member: User details, list of checked-out items.
Catalog (Search Index): Efficiently finds books.
LibraryService (Facade): The main controller handling transactions.
3. Class Diagram
Code snippet

4. Python Implementation
A. The Models (Book vs BookItem)
This separation is the most important part of the design.
Python
B. The User (Account & Member)
Python
C. Search Catalog (Index)
A simple list search is . A proper design uses HashMaps for lookup.
Python
D. The System (Facade & Fine Logic)
This manages the business rules.
Python
E. Driver Code
Python
5. Key Interview Nuances
1. Fine Calculation (Strategy Pattern)
Question: What if books have different fine rates (e.g., Rare books = $10/day, Magazines = $0.50/day)?
Design: Don't hardcode
_calculate_fine. Use the Strategy Pattern.Interface
FineStrategywith methodcalculate(days).Strategies:
StandardFine,PremiumFine.The
BookItemholds a reference to its specific strategy.
2. Concurrency
Question: Two users scan the QR code of the same book item at the exact same millisecond via the mobile app.
Solution: You need locking.
DB Level:
SELECT * FROM book_item WHERE barcode = 'xyz' FOR UPDATE;(Row-level locking).App Level: Python
threading.Lock()inside theborrow_bookmethod for that specific barcode.
3. Notification System (Observer Pattern)
Question: Notify user when a reserved book becomes available.
Design: Implement Observer Pattern.
User subscribes to
BookMetadata.When
BookItemstatus changes fromLOANEDtoAVAILABLE, the System checks the Reservation Queue and notifies the next observer (User).
4. Barcode Reading
In a machine coding round, mentioning that
barcodeis the unique identifier (Primary Key) forBookItemwhileISBNis the identifier forBookis a strong signal of domain knowledge.
Would you like to explore "Design a Vending Machine" (State Pattern) or "Design a Snake & Ladder Game" next?
Lift
Here is a detailed Low Level Design (LLD) for an Elevator (Lift) System.
This is a very popular Machine Coding problem because it tests State Management, Scheduling Algorithms, and Concurrency.
1. Requirements Clarification
Setup: Building with Floors (0 to ). Elevators.
Capacity: Each elevator has a max weight/person limit.
Buttons:
Inside: Panel to select specific floors (1, 2, 5...).
Outside (Hall): Up/Down buttons on each floor.
Behavior:
The elevator should not change direction violently. It should finish requests in the current direction before switching (The SCAN Algorithm).
Example: If going
UPfrom 1 to 5, and a user at 3 pressesUP, stop at 3. If a user at 4 pressesDOWN, ignore them until the elevator reaches the top and starts coming down.
2. Core Algorithm: The SCAN (Look) Algorithm
Naive First-Come-First-Serve (FCFS) is bad for elevators (it moves like a yo-yo).
We use the SCAN Algorithm:
Move in
current_direction.Stop at all requested floors in that direction.
If no more requests in that direction, check pending requests in the opposite direction.
Reverse direction.
Data Structure: Two Priority Queues.
Up Queue (Min-Heap): Sorts stops in ascending order (1 2 5).
Down Queue (Max-Heap): Sorts stops in descending order (10 8 3).
3. Class Diagram
Code snippet

4. Python Implementation
A. Enums and Request
Python
B. The Elevator (The Worker)
This is where the complex state logic lives. We use two heaps to manage the path.
Python
C. The Controller (Dispatcher)
This decides which elevator gets the external "Hall Call".
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Dispatch Algorithm Strategy
Question: How do you decide which elevator to send?
Approaches:
Nearest Idle: Simple but inefficient during rush hour.
Sectoring: Divide building (Elevator A serves 0-10, B serves 11-20). Good for skyscrapers.
Destination Dispatch (Modern): Users type destination floor in the lobby before entering. The system groups people going to the same floor into the same elevator. This drastically reduces stops.
2. State Pattern
Question: The
move()method has manyif-elsechecks. Can we improve this?Answer: Use the State Design Pattern.
Classes:
IdleState,MovingUpState,MovingDownState.Each state implements
move()andhandle_request()differently.Example:
MovingUpState.move()simply increments floor, whileIdleState.handle_request()transitions the state to Moving.
3. Thread Safety
Question:
add_requestcan happen whilemoveis running.Answer: Access to
up_stopsanddown_stops(the priority queues) must be synchronized.In Java:
synchronizedmethods.In Python:
with self.lock:around the heap operations.
4. Handling "Emergency Stop"
We would add a boolean
is_operational. IfFalse, the dispatcher ignores this elevator.If
FireAlarmtriggers,ElevatorSystemoverrides all queues: clearup_stops, push0(Lobby) todown_stops, and force direction DOWN.
5. Why Priority Queues?
Using a simple List/Array requires sorting every time a button is pressed ().
A Priority Queue (Heap) allows insertion in and always gives us the next closest floor in .
Would you like to move on to "Design a Snake and Ladder Game" or "Design a Parking Lot"?
Last updated