Intermediate

Internet Download Manager

An end-to-end Low Level Design (LLD) for an Internet Download Manager (IDM).

This problem tests your understanding of Concurrency (Multi-threading), Network Protocols (HTTP Range), and File I/O.


1. Requirements Clarification

  • Core Functionality: Download files from a URL.

  • Acceleration: Split the file into multiple parts (segments) and download them in parallel.

  • Resumability: Ability to pause and resume downloads (persisting state).

  • Assembly: Merge segments into the final file upon completion.

  • Status Tracking: Pending, Downloading, Paused, Completed, Failed.


2. Core Concept: HTTP Range Headers

The magic of IDM lies in the HTTP protocol.

  1. Head Request: We first send a HEAD request to check the file size (Content-Length) and if the server supports parallel downloads (Accept-Ranges: bytes).

  2. Range Request: Workers send GET requests with specific byte ranges.

    • Worker 1: Range: bytes=0-1000

    • Worker 2: Range: bytes=1001-2000


3. Core Entities

  1. DownloadTask: Represents the file being downloaded. Holds metadata (URL, file size, status).

  2. Segment (Chunk): Represents a specific byte range (Start, End) and its download status.

  3. Worker (Thread): A separate thread responsible for downloading one Segment.

  4. DownloadManager: The Controller. Starts tasks, manages the thread pool, and handles file assembly.


4. Class Diagram


5. Python Implementation

A. The Segment and Task

These classes hold the data state.

Python

B. The Worker (Thread)

This performs the actual HTTP Request with the Range header.

Python

C. The Manager (Orchestrator)

Starts the download, waits for threads, and triggers merging.

Python


6. Key Design Decisions & Interview Talking Points

1. Handling Resume (Persistence)

  • Question: How does the system know where to resume after a crash?

  • Design: We must serialise the DownloadTask object (including its list of Segments and their downloaded bytes count) to a file (e.g., .json or .idm file) on disk.

  • On Startup: The Manager reads this file, reconstructs the Segment objects, and the ChunkDownloader calculates the new Range header: Range: bytes=(start + downloaded)-end.

2. File I/O Optimization

  • Design: I used separate temporary files (.part0, .part1) for each thread.

  • Why? This avoids the complexity of Locking. If all threads wrote to the same big file using seek(), we would need a mutex to prevent race conditions, which slows down throughput. Merging at the end is faster than locking on every write.

3. Concurrency Model

  • Logic: I used a Thread per Segment.

  • Bottleneck: Too many threads can hurt performance due to Context Switching and disk I/O saturation.

  • Refinement: A real IDM uses a bounded ThreadPool (e.g., max 8 threads) regardless of how many files are in the queue.

4. Network Failure Handling

  • Logic: If a specific chunk fails, only that specific Segment is marked as FAILED. The Manager has a retry policy (Exponential Backoff) to retry only the failed chunks, not the whole file.

Would you like to explore "Design a URL Shortener" (System Design/HLD) or stick to LLD with "Design an Elevator System"?

Coupon System for Zepto

An end-to-end Low Level Design (LLD) for a Coupon & Discount System (similar to Zepto, Swiggy, or Amazon).

This problem focuses on Flexibility and the Strategy Design Pattern. You need a system that can handle "10% Off", "Flat $50", "BOGO", and "Free Shipping" without rewriting code every time a marketing team invents a new promo.


1. Requirements Clarification

  • Coupon Types:

    • Percentage: "Get 10% off up to $50".

    • Flat: "Get $100 off".

    • Condition-based: "Free delivery on orders over $500".

  • Constraints (Rules):

    • Minimum Order Value: E.g., Cart > $200.

    • Category Specific: Only applies to "Electronics".

    • User Specific: "New Users Only".

    • Expiry: Valid between Date A and Date B.

  • Behavior:

    • Validate if a coupon is applicable to the current cart.

    • Calculate the discount amount.


2. Core Entities & Patterns

  1. Cart: The context object containing items and total value.

  2. Coupon: The core entity. It doesn't hardcode logic; instead, it holds Strategies.

  3. Constraint (Interface): Defines conditions (e.g., isValid(cart)). Uses Chain of Responsibility or a simple List validation.

  4. Reward (Interface): Defines calculation (e.g., calculateDiscount(cart)). Uses Strategy Pattern.


3. Class Diagram

The key takeaway is separating the "Condition" (Can I use it?) from the "Reward" (What do I get?).


4. Python Implementation

A. The Context (Cart & Item)

Simple data holders.

Python

B. The Constraints (Validation Strategies)

We define small, reusable rules.

Python

C. The Rewards (Calculation Strategies)

This handles the math. Note the logic in PercentageReward to handle maximum caps.

Python

D. The Coupon (The Composite Entity)

The Coupon itself is lightweight. It simply iterates through its constraints and then delegates calculation to its reward.

Python

E. Orchestration (Driver Code)

Python


5. Key Design Decisions & Interview Talking Points

1. Strategy Pattern vs Inheritance

  • Bad Approach: Creating classes like FlatCoupon, PercentageCoupon, MinOrderFlatCoupon. This leads to a class explosion.

  • Good Approach (This Design): Composition. A Coupon has a list of Constraints and has a Reward strategy. This allows dynamic combinations (e.g., a "Flat Discount" valid only for "Electronics" on "Weekends").

2. Extensibility

  • Question: "How would you add a 'Buy One Get One' (BOGO) coupon?"

  • Answer: I would create a new Reward Strategy: BogoReward.

    • It implements calculate(cart).

    • Logic: Scan items, find pairs, deduct the price of the cheaper item.

    • No changes needed in Coupon or Cart classes.

3. Concurrency (Database Design HLD hint)

  • If this is a "First 100 users only" coupon, the Constraint needs to check a global counter. This introduces concurrency issues.

  • Solution: The GlobalLimitConstraint would need to interact with a Redis atomic counter (decrement) to ensure we don't oversell the coupon.

4. Multiple Coupons

  • Question: Can I stack coupons?

  • Answer: The current design allows testing coupons individually. To support stacking, the CouponManager would take a list of coupons, apply the first one, update the cart's "effective total", and apply the second one.

Would you like to explore "Design a Payment Gateway Integration" (Handling states, retries, and idempotency) next?

Android Unlock Pattern

An end-to-end Low Level Design (LLD) for the Android Unlock Pattern system.

This problem is a classic example of Graph Theory (DFS/Backtracking) applied to a UI interaction problem. It tests how you model constraints, validate user input, and handle state transitions.


1. Requirements Clarification

  • Grid: Standard grid (9 dots).

  • Input: A sequence of connected points (the "Pattern").

  • Constraints:

    1. Minimum Length: Pattern must connect at least 4 dots.

    2. Unique Nodes: A dot can be visited only once in a single pattern (unless passing through an already visited dot, though standard implementations usually strictly forbid re-visiting as a "node endpoint").

    3. Direct Path: If connecting Dot A to Dot C passes through Dot B, Dot B must be part of the pattern (unless B was already visited).

  • Operations:

    • Set Pattern: Save a new valid pattern.

    • Verify Pattern: Check if input matches the saved pattern.


2. Core Entities

  1. Point (Node): Represents a specific dot on the grid or ID .

  2. Pattern: A strictly ordered list of Points.

  3. GridManager: Knows the geometry. Validates if a move from is legal (e.g., is it adjacent? Does it skip a node?).

  4. UnlockService: The controller. Stores the "password" (hashed) and manages the Set vs Verify state.


3. Class Diagram


4. Python Implementation

A. The Point Entity

We map the 3x3 grid to coordinates to .

Python

B. The Validator (The "Skip" Logic)

This is the hardest part.

  • Scenario: Moving from to skips .

  • Rule: This is only legal if has already been visited.

  • Math: The midpoint between two points and is an integer coordinate only if and are both even.

Python

C. The Lock System (Controller)

Python


5. Key Interview Points

1. The "Skip" Logic (Midpoint Formula)

  • Question: How do you detect if 0 -> 8 skips 4?

  • Answer:

    • .

    • Sum of coords: .

    • Both are even, so Midpoint exists at , which is Index 4.

    • If Index 4 is NOT in visited_set, the move is illegal.

  • Edge Case: Knight's move (like Chess).

    • .

    • Sum: . Row sum is odd. No integer midpoint.

    • Move is valid without any visited checks (knights jump over pieces).

2. Security (Hashing)

  • Never store the pattern as a plain list [0, 1, 2, 5].

  • Convert the sequence to a string or byte array and SHA-256 hash it.

  • Salt: Add a user-specific salt to prevent Rainbow Table attacks (though the search space for patterns is small, ~389,000 combinations, so salting is crucial).

3. Maximum Combinations (DFS)

  • If the interviewer asks "How many valid patterns exist?", this becomes a DFS / Backtracking counting problem.

    • Min length 4, Max length 9.

    • You write a recursive DFS that attempts every valid move from the current node and counts the paths.

4. Code Extensibility

  • Grid Size: The code calculates row = index // 3. By changing 3 to N, the Point and Validator logic supports 4x4 or 5x5 grids instantly.

Would you like to move to "Design a Code Deployment System" (like Jenkins/deployment pipelines)?

Scalable Database Schema for Reddit-like Comments

An end-to-end Low Level Design (LLD) and Schema Design for a scalable, threaded comment system like Reddit.

This is a classic system design interview question that tests your knowledge of Hierarchical Data Modeling in relational databases.


1. Requirements Clarification

  • Structure: Infinite nesting (Comment A Reply B Reply C).

  • Scale: Millions of comments per post (potentially).

  • Performance:

    • Read-heavy: Users view comments much more than they write.

    • Pagination: Cannot load the entire tree at once; need "Load more replies".

    • Ranking: Comments are sorted by relevance (score) or time.

  • Operations: Post comment, Reply, Upvote/Downvote, Edit.


2. The Core Problem: Storing Trees in SQL

There are four common ways to store trees. For Reddit, we compare the two best options:

Strategy

Concept

Pros

Cons

Verdict

Adjacency List

Column parent_id

Simple, easy inserts.

Reading a full tree requires recursive queries (CTEs).

Good (Standard)

Materialized Path

Column path (e.g., "1/5/12")

Extremely fast subtree retrieval via regex/prefix.

Moving subtrees is hard (rare in Reddit).

Best for Scale

We will use the Materialized Path (Path Enumeration) pattern because it makes fetching and sorting nested threads efficient without complex recursion.


3. Database Schema

Table 1: Comments

This is the heart of the system.

SQL

Table 2: CommentVotes

Keeps track of who voted on what to prevent double voting.

SQL


4. Key Operations (SQL Logic)

A. Inserting a Comment

When User B replies to User A (Comment ID 100, Path 0010/0100):

  1. Generate Path: Parent_Path + / + New_ID.

  2. Calculate Depth: Parent_Depth + 1.

Note: In a real system, you might generate the ID separately or use a UUID to construct the path before insertion.

SQL

B. Fetching the Tree (The "Reddit View")

This is where Materialized Path shines. If you sort by the path column string, the comments naturally align in a Depth-First Search (DFS) order—exactly how they appear on a UI.

SQL

Result:

  1. 0001 (Root Comment)

  2. 0001/0002 (Reply to 1)

  3. 0001/0002/0003 (Reply to 2)

  4. 0005 (New Root Comment)

Pagination: To "Load More" from a specific point, you can simply query WHERE path > 'last_seen_path'.


5. Python LLD Implementation

Code Structure

Python


6. Optimization & Scaling (Interview Gold)

1. The "Viral Post" Problem (Hot Partition)

  • Issue: If one post has 1 million comments, querying WHERE post_id = X becomes slow.

  • Solution: Database Sharding. Shard comments based on post_id. All comments for Post X live on Shard A. This keeps the tree structure intact on one server.

2. Denormalization (Vote Counts)

  • Issue: Calculating SELECT COUNT(*) FROM Votes WHERE comment_id = X every time a user views a thread is fatal for performance.

  • Solution: Add upvote_count column to Comments table.

    • Write Path: When a user votes, update the Votes table AND increment the Comments.upvote_count.

    • High Scale: Use Redis to buffer increments. Update the SQL database in batches (e.g., every 10 seconds) to reduce write locking.

3. Caching (Redis)

  • Cache the Top 200 comments of hot threads in Redis.

  • Structure: Redis List or Sorted Set serialized as JSON.

  • Since the top comments don't change structure often (mostly just vote counts change), cache invalidation is manageable.

4. Why not NoSQL (MongoDB/Cassandra)?

  • MongoDB: Actually a good fit. You can store the comment tree as a nested document (up to 16MB limit) or use the same "Materialized Path" pattern.

  • Cassandra: Great for writes, but strictly hierarchical fetching (Sorting) is harder. You would need to model the partition key carefully (e.g., post_id) and use clustering keys for sorting, but recursive nesting depth is hard to model without application-side logic.

  • Verdict: SQL (Postgres) is usually preferred for comments because of strong consistency (ACID) requirements for voting and structural integrity, but MongoDB is a very strong runner-up.

Next Step

Would you like to move on to "Design a Leaderboard System" (Redis Sorted Sets) or "Design a Notification Service"?

Sublime Text IDE

An end-to-end Low Level Design (LLD) for a Text Editor (like Sublime Text, Notepad++, or VS Code).

This problem is unique because it focuses heavily on Data Structures (how to store text efficiently) and the Command Pattern (how to handle Undo/Redo).


1. Requirements Clarification

  • Core Actions: Insert text, Delete text, Move cursor (Up/Down/Left/Right).

  • File Handling: Open, Save.

  • Undo/Redo: Infinite undo history.

  • Selection: Highlight and modify a range of text.

  • Multiple Views: Support editing multiple files (Tabs).

  • Search: Find strings within the file.


2. The Core Data Structure (The "Secret Sauce")

How do we store the text? A simple String is for insertions. We need better options:

  1. Gap Buffer (Used by Emacs): A dynamic array with a "gap" of empty space at the cursor. Insertions at the cursor are . Moving the cursor far away requires shifting the gap ().

  2. Rope (Used by Heavyweights): A binary tree where leaves are strings. Efficient for massive files.

  3. Piece Table (Used by VS Code): Stores the original file (Read-Only) and an "Add Buffer" (Append-Only). The document is a list of pointers to these buffers.

  4. List of Lines (Our Choice): A Linked List or Array of Strings, where each string is a line.

    • Why? It's easy to implement in an interview and efficient enough for most cases (vertical movement is ).


3. Core Entities & Patterns

  1. TextBuffer: The Model. Holds the content (List of Lines).

  2. Cursor: Tracks position (row, col).

  3. Command (Pattern): Encapsulates an action (InsertCommand, DeleteCommand). This is mandatory for Undo/Redo.

  4. CommandHistory: A Stack that holds executed commands.

  5. Editor: The Controller. Binds inputs to commands.

  6. Flyweight (Pattern): (Optional) Used for rendering characters with syntax highlighting to save memory.


4. Class Diagram


5. Python Implementation

A. The Model (Buffer & Cursor)

We use a List of Lines approach. This makes handling "Enter" (splitting a line) and "Backspace" (merging lines) the critical logic.

Python

B. The Command Pattern (Undo/Redo)

Every action must be reversible.

Python

C. The History Manager

Manages the stacks.

Python

D. The Editor (Controller)

Python


6. Key Interview Points

1. Data Structures (The "Gap Buffer")

  • Question: Why not use a simple long string?

  • Answer: Inserting into a string is (requires shifting all subsequent characters).

  • Optimization: In a real-world editor (Emacs), we use a Gap Buffer.

    • [H, E, L, L, O, _, _, _, W, O, R, L, D]

    • The underscores represent empty space (the gap) at the cursor position. Insertion is .

    • If the cursor moves, we shift the gap.

2. Syntax Highlighting (Flyweight Pattern)

  • If we have 1 million 'A' characters, creating 1 million objects is heavy.

  • Flyweight: We create one Glyph('A', Font, Color) object and reuse it, storing only the position externally.

  • Strategy Pattern: Syntax highlighting rules (Python vs Java) are different strategies injected into the parser.

3. Large File Support (Proxy/Lazy Loading)

  • Question: How to open a 10GB log file?

  • Answer: Do not load the whole file into RAM.

    • Use Memory Mapping (mmap).

    • Load only the "viewport" (visible lines) + a small buffer above/below.

    • This is the Virtual Proxy pattern.

4. Search (KMP Algorithm)

  • Implementing Ctrl+F requires string matching. A naive loop is . The Knuth-Morris-Pratt (KMP) algorithm does it in , which is preferred for text editors.

5. Composite Pattern

  • Used if the editor supports grouping elements, like folding sections of code (blocks) or rendering rich text where a Document contains Paragraphs, which contain Sentences, which contain Characters.

Would you like to move on to "Design a Scheduler / Job Queue" (Concurrency & Heaps) or "Design a Cache (Redis)"?

Ngrok Tool

Here is a detailed Low Level Design (LLD) for an Ngrok-like Tunneling Tool.

This is a System Design / Networking heavy problem. The core challenge is exposing a service running behind a NAT/Firewall (localhost) to the public internet without port forwarding.


1. Requirements Clarification

  • Goal: Expose a local port (e.g., localhost:8080) to a public URL (e.g., myapp.tunnel.com).

  • Actors:

    1. Public User: Accesses the URL via browser.

    2. Tunnel Server (Cloud): Accepts public traffic and routes it to the correct agent.

    3. Tunnel Client (Agent): Runs locally, connects to the server, and forwards traffic to the local app.

  • Key Features:

    • NAT Traversal: The client must initiate the connection (outbound) because inbound ports are blocked.

    • Multiplexing: Multiple public users should use the same single persistent TCP connection to the agent.

    • Protocol: HTTP/TCP.


2. High-Level Architecture

The magic of Ngrok is a Reverse Tunnel.

  1. Agent connects to Server (on a control port, e.g., 4443). Connection is kept open.

  2. Server listens on a public port (e.g., 80) for internet traffic.

  3. User requests http://abc.tunnel.com.

  4. Server looks up the connection for abc, encapsulates the HTTP request, and sends it down the socket to Agent.

  5. Agent unwraps it, calls localhost:8080, gets response, and sends it back up the tunnel.


3. Class Diagram & Entities


4. Python Implementation (Machine Coding)

We will use Python's asyncio because this is an I/O-bound problem requiring high concurrency.

A. The Tunnel Server (server.py)

This runs on the Cloud. It acts as a switchboard.

Python

B. The Tunnel Client (agent.py)

This runs on your laptop. It connects Server <-> Localhost.

Python


5. Key Design Challenges (Interview Gold)

1. Handling Concurrency (Multiplexing)

  • Problem: In the code above, if the Agent is processing one request (waiting for localhost), the tunnel is blocked. Other users hang.

  • Solution (Real Ngrok): The tunnel shouldn't just send raw bytes. It should wrap data in Frames.

    • Frame: [RequestID | Payload_Length | Payload]

    • The Agent reads the frame, spawns a new async task to talk to localhost, and when done, sends a frame back with the same RequestID.

    • This allows asynchronous multiplexing over a single TCP socket.

2. Disconnections & Reconnecting

  • Problem: Wi-Fi drops.

  • Solution: The Agent needs a while True loop with exponential backoff retry logic around the connect() function. It should also send PING frames (Heartbeats) to keep the NAT mapping alive.

3. Security

  • Problem: Anyone can connect to your server and register "https://www.google.com/search?q=google.tunnel.com".

  • Solution:

    • Auth Tokens: Agent sends REGISTER: {token} : {subdomain}.

    • TLS Termination: The Public Server should handle HTTPS certificates (wildcard cert *.tunnel.com), decrypt the traffic, and send plain HTTP (or re-encrypted) over the tunnel.

4. Protocol Support

  • Ngrok supports HTTP, TCP, and TLS tunnels.

  • For Raw TCP (e.g., exposing an SSH port), the server cannot parse headers. It simply allocates a random public port (e.g., tunnel.com:12345) and pipes everything from that port to the agent.


6. Execution Flow Summary

  1. Start Local App: python -m http.server 5000 (runs a dummy site).

  2. Start Tunnel Server: python server.py. It listens on 9090 (control) and 8080 (web).

  3. Start Agent: python agent.py. It connects to 9090 and says "I am demo".

  4. Public Access: Browser opens http://localhost:8080.

    • Server sees "Host: localhost". Routes to 'demo' Agent.

    • Agent sends to port 5000.

    • Port 5000 replies.

    • Agent sends back to Server.

    • Browser shows the page.

Would you like to explore "Design a Load Balancer" or "Design a Collaborative Document Editor (Operational Transformation)" next?

Rate Limiter

An end-to-end Low Level Design (LLD) for a Rate Limiter.

This is a classic Machine Coding problem that tests your knowledge of Concurrency, Design Patterns (Strategy), and Algorithms.


1. Requirements Clarification

  • Goal: Restrict the number of requests a client (User/IP) can make within a time window.

  • Rules:

    • Limit: e.g., "10 requests per second".

    • Granularity: User-based (User ID) or Global.

  • Behavior: Return True (Allowed) or False (Throttled/HTTP 429).

  • Environment:

    • Thread-Safe: Must handle concurrent requests correctly.

    • In-Memory: For this problem, we assume a single server instance (though we should discuss distributed scaling).


2. Algorithm Choice

There are two primary algorithms suitable for an LLD interview. We must choose one and justify it.

  1. Sliding Window Log: Stores the timestamp of every request. Accurate, but memory-intensive ( space per user).

  2. Token Bucket (Recommended): A "bucket" holds tokens. Tokens refill at a constant rate. Each request consumes a token.

    • Pros: Memory efficient ( space—just an integer counter), allows short bursts of traffic (good UX), easy to implement.

We will implement the Token Bucket Algorithm.


3. Class Diagram

We use the Strategy Pattern to allow swapping algorithms (e.g., switching from Token Bucket to Leaky Bucket) without changing the core service logic.

Code snippet


4. Python Implementation

A. The Bucket Entity (Thread-Safe)

This class holds the state for a single user.

Crucial Concept: We use Lazy Refill. We don't have a background timer adding tokens every second. Instead, when a request comes in, we calculate: tokens_to_add = (now - last_time) * rate.

Python

B. The Strategy (Manager)

This manages the collection of buckets for different users.

Python

C. The Service (Facade) & Driver Code

Python


5. Key Design Decisions & Interview Q&A

1. Handling Concurrency

  • Question: Why did you use locks inside the Bucket class?

  • Answer: Granularity.

    • If I locked the entire TokenBucketLimiter class (the dictionary), User A's request would block User B's request.

    • By locking only the Bucket instance, User A only contends with other requests from User A. This allows high throughput for the system overall.

2. Token Bucket vs Leaky Bucket

  • Question: Why Token Bucket?

  • Answer: Token Bucket allows Bursts. In real-world web usage, a user loading a page might fire 10 simultaneous API calls. Token Bucket allows this (up to capacity). Leaky Bucket enforces a rigid constant rate (queueing or dropping burst requests), which can degrade UX for innocent bursts.

3. Distributed Rate Limiting (Scaling Up)

  • Question: How does this change if we have 5 servers behind a Load Balancer?

  • Answer: In-memory dictionaries won't work (requests are split across servers).

    • We move the state to a centralized store like Redis.

    • We use Lua Scripts in Redis to perform the "Check -> Decrement" operation atomically.

    • Algorithm: redis.eval('local curr = redis.call("get", key); if curr > 0 then ...').

4. Memory Cleanup

  • Question: self.buckets grows forever. How do you fix memory leaks?

  • Answer: We need a cleanup strategy.

    • TTL (Time To Live): If using Redis, keys expire automatically.

    • Background Worker: A thread runs every few minutes to delete Bucket objects where last_refill_timestamp is older than 1 hour.

    • LRU Cache: Implement self.buckets as an LRU Cache (e.g., Python functools.lru_cache or OrderedDict) with a max size.

Would you like to move on to "Design a Splitwise (Expense Sharing App)" or "Design a Notification System"?

Thread Pool

An end-to-end Low Level Design (LLD) for a Custom Thread Pool.

This is a fundamental concurrency problem. It tests your understanding of OS Concepts (Threads vs. Processes), Synchronization, and the Producer-Consumer Pattern.


1. Requirements Clarification

  • Goal: Create a container that manages a fixed number of threads to execute tasks concurrently.

  • Why? Creating and destroying threads is expensive (OS overhead). A pool reuses threads.

  • Core Features:

    • Fixed Size: n threads running in the background.

    • Task Queue: A buffer to hold tasks when all threads are busy.

    • Non-blocking Submission: submit(task) should return immediately, not wait for execution.

    • Shutdown: Ability to stop accepting new tasks and shut down workers gracefully.


2. Core Architecture: Producer-Consumer

The Thread Pool is a classic implementation of the Producer-Consumer pattern.

  • Producers: The main program calling pool.submit(task).

  • Buffer: A thread-safe Blocking Queue.

  • Consumers: The "Worker Threads" that loop infinitely, pulling tasks from the queue.


3. Class Diagram

Code snippet


4. Python Implementation

We will use Python's threading and queue modules. The queue.Queue class is already thread-safe (it uses locks/semaphores internally), which simplifies the code significantly.

A. The Worker (The Consumer)

This is a wrapper around a system thread. It runs an infinite loop.

Python

B. The Thread Pool (The Manager)

This manages the lifecycle of the workers.

Python

C. Driver Code

Python


5. Key Design Decisions & Interview Talking Points

1. The "Poison Pill" (Graceful Shutdown)

  • Problem: How do you kill a thread that is blocked on queue.get()? You can't force-kill a thread safely in most languages (it might leave resources/locks open).

  • Solution: Insert a special sentinel value (like None or a StopSignal object) into the queue.

  • Logic: When a worker pulls this value, it realizes "The Manager wants me to stop," breaks its loop, and exits cleanly. You must push exactly pills for threads.

2. Blocking Queue

  • Why: We use a blocking queue so that if there are no tasks, the worker threads go to sleep (Waiting State).

  • Benefit: This saves CPU. If we used a standard list and a while loop, the threads would "spin" (busy wait), consuming 100% CPU checking for new work.

3. Future / Promise (Returning Results)

  • The implementation above is "Fire and Forget" (void return).

  • Interview Question: "How would you return a value?"

  • Answer: The submit() method should return a Future object immediately.

    • The Future has a .get() method that blocks.

    • The Worker calculates the result and sets it inside the Future.

    • Any thread waiting on .get() is then notified (using a Condition Variable).

4. Ideal Pool Size

  • CPU-Bound Tasks: Size = Number of Cores (e.g., 4 or 8). Adding more threads just increases context-switching overhead without performance gain.

  • I/O-Bound Tasks (Web scraping, DB calls): Size can be much larger (e.g., 50 or 100). Threads spend most of their time waiting for network responses, so one core can manage many waiting threads efficiently.

5. Rejection Policy

  • Question: What if the queue gets full? (If maxsize was set).

  • Answer: We need a Rejection Policy.

    1. Abort: Throw an exception.

    2. CallerRuns: The main thread (producer) runs the task itself (slowing down submission).

    3. Discard: Throw away the oldest or newest task.

Would you like to move on to "Design a Pub-Sub System (like Kafka)" or "Design a Splitwise"?

OYO/Airbnb - Part 1: Database Modelling

An end-to-end Low Level Design (LLD) for a Hotel Booking System (like OYO or Airbnb).

This problem focuses heavily on Data Modeling, Date Handling, and most importantly, Concurrency Control (preventing double bookings).


1. Requirements Clarification

  • Actors:

    • Guest: Searches for hotels, views details, books rooms.

    • Host/Admin: Adds hotels, manages rooms/inventory.

    • System: Manages bookings, payments, and notifications.

  • Core Use Cases:

    • Search: Find hotels by City + Date Range.

    • Book: Reserve a specific room type for specific dates.

    • Cancel: Cancel a booking and free up inventory.

  • Critical Constraint:

    • No Double Bookings: Two users cannot book the same room for the same date.

    • Scalability: Search should be fast.


2. Core Entities & Relationships

  1. Hotel: The physical property (Name, Address, Rating).

  2. Room: A specific unit (Room 101) or a Category (Deluxe). For Airbnb, it's usually a specific unit. For OYO, it's often Categories. We will design for Room Instances (e.g., Room 101, 102) to make availability logic clear.

  3. Booking: Connects a User, a Room, and a Date Range.

  4. InventoryManager: The engine that checks if dates are free.


3. Class Diagram

Code snippet


4. Python Implementation

A. Basic Models (Hotel & Room)

The core logic for "Is this room free?" lives inside the Room class.

Python

B. The Booking Manager (The Controller)

This handles the high-level logic and the critical "Check-Then-Act" race condition.

Python

C. Driver Code

Python


5. Key Design Challenges & Interview Answers

1. Handling Concurrency (Double Booking)

  • Problem: User A and User B check availability. Both see "Free". Both pay. Both Book.

  • LLD Solution (Single Machine): synchronized keyword (Java) or Lock (Python) on the specific Room object.

  • System Design Solution (Distributed):

    • Optimistic Locking: Add a version column to the Room table. When updating, check ver = old_ver. If it fails, retry.

    • Pessimistic Locking (Better for bookings): Use SQL SELECT ... FOR UPDATE when reading the room row. This locks the row until the transaction completes.

2. Search Optimization

  • Problem: Iterating through all hotels to check availability (is_available) is slow ().

  • Solution: Invert the index. Maintain an Availability Table.

    • Table: RoomAvailability (room_id, date, status).

    • Query: SELECT room_id FROM RoomAvailability WHERE date IN (d1, d2) AND status = 'FREE'.

    • Even better: Use Elasticsearch or Geospatial Indexes (QuadTrees) to filter by Location first, then filter by Availability.

3. Dynamic Pricing

  • Add a PricingStrategy interface (Strategy Pattern).

    • BasePriceStrategy

    • DemandSurgeStrategy (if booked % > 80, price * 1.5)

    • SeasonalStrategy (if month == Dec, price * 2)

4. Scalability (Hotels vs Rooms)

  • Airbnb lists unique homes (Room = Hotel essentially).

  • OYO lists Hotels with types.

  • The design above uses Room as a unique unit (Room 101). If OYO has 100 "Standard Rooms" identical to each other, we can model this as RoomTypeInventory with a count (e.g., count=100) instead of 100 row objects. This saves DB space.

Would you like to continue with "Design a Parking Lot" (classic) or "Design a Tic-Tac-Toe Game"?

Google Calendar Database Model

An end-to-end Low Level Design (LLD) for a Calendar System (like Google Calendar or Outlook).

This problem is a test of Data Modeling (how to represent time) and Algorithms (how to efficiently detect overlaps and manage recurring events).


1. Requirements Clarification

  • Core Entities: Users, Events, Calendars.

  • Event Properties: Title, Start Time, End Time, Location, Attendees.

  • Core Operations:

    • Create, Update, Delete events.

    • Get Events: View a day/week/month.

    • Conflict Detection: Check if a user is free during a specific time slot.

  • Advanced Features (Bonus):

    • Recurring Events: (e.g., "Every Monday").

    • Invites: Adding other users to an event.

    • Reminders: Notification logic.


2. Core Concept: Handling Time

Time is tricky. To ensure the system works globally:

  1. Storage: Always store time in UTC (Epoch timestamps).

  2. Display: Convert to the User's Local Timezone only at the UI layer.

  3. Intervals: We need a robust class to handle time ranges and overlap logic.


3. Class Diagram

We will use a Service-Oriented Architecture. The CalendarService acts as the facade.

Code snippet


4. Python Implementation

A. The TimeSlot (Interval Logic)

This helper class encapsulates the math for checking if two events clash.

Python

B. The Event & User

Basic entities.

Python

C. The Calendar

Represents one user's schedule. Note the check_conflict method.

Python

D. The Service (Controller)

Handles the logic of inviting people (adding events to multiple calendars).

Python

E. Driver Code

Python


5. Key Interview Design Challenges

1. Recurring Events ("Every Monday")

  • Naive Approach: Create 52 separate event objects for the next year.

    • Problem: Waste of space. If user changes the title, you have to update 52 objects.

  • Smart Approach (RRULE): Store the event once with a Recurrence Rule (RFC 5545 standard).

    • Example Rule: FREQ=WEEKLY;BYDAY=MO

    • Retrieval: When get_events(Jan 1, Jan 31) is called, the system dynamically calculates the "Virtual Events" generated by the rule that fall in that window.

2. Timezone Management

  • Problem: Alice (New York) creates an event at 9 AM EST. Bob (London) views it.

  • Solution: Store 9 AM EST as 14:00 UTC in the database. When Bob views it, convert 14:00 UTC + 0 hours (London/GMT) = 2 PM.

3. Efficient Search (Data Structures)

  • Problem: get_events using a simple list is .

  • Solution: Use an Interval Tree or Segment Tree.

    • This allows finding all intervals overlapping a specific range in time (where K is the number of results).

    • In a relational DB (SQL), we use:

      SQL

      (Composite index on start_time helps, but interval queries are notoriously hard to optimize perfectly in B-Trees).

4. Meeting Scheduler (Find Free Slots)

  • Feature: "Find a time where Alice, Bob, and Charlie are all free."

  • Algorithm:

    1. Fetch sorted event intervals for all 3 users.

    2. Merge intervals to find "Busy Blocks" (like the "Merge Intervals" LeetCode problem).

    3. Invert the "Busy Blocks" to find "Free Blocks".

    4. Return blocks where duration >= requested_duration.

Would you like to explore "Design an Elevator System" (State Machine) or "Design a Chat Application (WhatsApp)"?

Google Authenticator

An end-to-end Low Level Design (LLD) for a Google Authenticator system.

This is a unique Machine Coding problem because it is less about "Architecture" and almost entirely about Cryptography and the RFC 6238 (TOTP) standard. The "magic" is that the Client (App) and Server generate the same 6-digit code without talking to each other.


1. Requirements Clarification

  • Goal: Generate and Validate 2-Factor Authentication (2FA) codes.

  • Core Mechanism: TOTP (Time-based One-Time Password).

  • Setup: The server generates a Secret Key, shares it with the User (usually via QR Code), and both save it.

  • Operation:

    • App: Uses Secret + Current Time Generates 6-digit code.

    • Server: Uses Stored Secret + Current Time Generates code Compares with User Input.

  • Constraints:

    • Code changes every 30 seconds.

    • Must handle slight time differences (clock drift) between phone and server.


2. The Algorithm (The "Secret Sauce")

You cannot design this without knowing the math. Google Authenticator uses HMAC-SHA1.

  1. Time Step (): . This ensures the input only changes every 30 seconds.

  2. Hashing: Calculate HMAC-SHA1(Secret_Key, T). This gives a 20-byte hash.

  3. Dynamic Truncation:

    • Take the last byte of the hash.

    • Extract the last 4 bits to get an offset (0 to 15).

    • Take 4 bytes from the hash starting at offset.

  4. Modulus: Convert those 4 bytes to an integer, remove the sign bit, and do % 1,000,000 to get the 6 digits.


3. Class Diagram

Code snippet


4. Python Implementation

We need hmac, struct (for binary manipulation), and base64 (Google Auth uses Base32 encoding for secrets).

A. The Math Core (RFC 6238)

Python

B. The Service (Server-Side Logic)

This handles User management and the "Window Verification" (Clock Drift).

Python

C. Driver Code (Simulation)

Python


5. Key Design Details (Interview Gold)

1. Why Base32?

You will be asked: "Why not Base64?"

  • Answer: Base32 only uses A-Z and 2-7. It avoids easily confused characters (like 1, I, l or 0, O) which is crucial if a user has to manually type the secret key (Backup Key) instead of scanning a QR code. It is also case-insensitive.

2. Clock Drift (Synchronization)

  • Problem: The user's phone is in Airplane mode and the clock drifted by 1 minute.

  • Solution: The verify_code loop (seen in the code above). Standard servers check a window of [-1, 0, +1] intervals. If the user is successfully verified at window -1, the server can store this "drift" and apply it to future requests for that user to improve UX.

3. Replay Attacks

  • Problem: An attacker intercepts a valid code and uses it immediately.

  • Solution: The server should store last_used_interval for the user. If current_interval == last_used_interval, reject the request. A TOTP code should be single-use.

4. QR Code Format

  • The "QR Code" is simply a URI scheme:

    otpauth://totp/Google:Alice?secret=JBSWY3DPEHPK3PXP&issuer=Google

    • Secret: The Base32 key.

    • Issuer: Who is asking for the code (e.g., Gmail).

5. Backup Codes

  • If the user loses the phone, they lose the Secret.

  • Design: When generating the secret, also generate 10 random 8-digit "Scratch Codes", store them in the DB (hashed), and display them to the user once. These are one-time use overrides.

Next Step

Would you like to solve "Design a URL Shortener (TinyURL)" or move to a complex game design like "Design Chess"?

Amazon Prime Video

An end-to-end Low Level Design (LLD) for a Video Streaming Service (Amazon Prime Video / Netflix).

This problem tests your ability to model Hierarchical Data (Series Season Episode), manage Access Control (Prime vs. Rent), and understand how modern video delivery works (Adaptive Bitrate Streaming).


1. Requirements Clarification

  • Content Types: Movies (Standalone) and TV Shows (Nested: Show Seasons Episodes).

  • Playback: Users select a resolution (or Auto) and stream video.

  • Entitlement (Monetization):

    • SVOD (Subscription): Free for Prime members.

    • TVOD (Transactional): Rent or Buy specific movies.

  • User Experience:

    • Watch History: Resume playback from where you left off.

    • Search: Find content by title/genre.


2. Core Concept: Adaptive Streaming (HLS/DASH)

In a real system, you do not stream one big MP4 file. You stream a Manifest File (.m3u8) which points to thousands of tiny chunks (.ts files) of video.

  • The LLD Implication: Your Video object doesn't hold byte[] data. It holds a Map<Resolution, URL> pointing to a CDN (Content Delivery Network).


3. Class Diagram

We use the Composite Pattern concept for the Movie/Series hierarchy and the Strategy Pattern (implicit) for Entitlement checks.

Code snippet


4. Python Implementation

A. The Data Models (Content Hierarchy)

We need a flexible structure. Movie and Episode are "Playable", but TVShow is just a container.

Python

B. User & Watch History

Python

C. The Service (Controller)

This handles the critical "Can I watch this?" logic and URL fetching.

Python

D. Driver Code

Python


5. Key Design Challenges & Interview Answers

1. Handling "Resume Watching" at Scale

  • Problem: If 10 million people pause every second, writing to the DB (UPDATE history SET time=...) is too heavy.

  • Solution: Write-Back Cache (Redis).

    • The frontend sends a "heartbeat" every 10 seconds with current progress.

    • Server updates Redis: SET user:video:timestamp.

    • A background worker flushes Redis to the permanent DB (Cassandra/DynamoDB) every few minutes or on session close.

2. Search (Elasticsearch)

  • You cannot search efficiently in a standard SQL DB with queries like "Action movies with Tom Cruise".

  • Use an Inverted Index (Elasticsearch/Solr).

    • Document: { "id": "mv_001", "title": "Mission Impossible", "cast": ["Tom Cruise"], "genre": "Action" }

    • Search is fuzzy and extremely fast.

3. CDN & Geo-Location

  • Question: Why does the video load fast in India but the server is in the US?

  • Answer: The get_stream_url method doesn't return a file on the app server. It returns a URL to a CDN (like CloudFront or Akamai) edge server. The CDN replicates the video file to a server physically close to the user (e.g., Mumbai).

4. DRM (Digital Rights Management)

  • To prevent users from downloading the .m3u8 stream and converting it to MP4, we use Encrypted Streams (Widevine, FairPlay).

  • The Player needs a License Key to decrypt the video chunks. The EntitlementService issues this token only if the user has rights.

5. Recommendation System

  • This is usually a separate Microservice ("Personalization Service").

  • It consumes the WatchHistory events via a message queue (Kafka) and updates a User-Item Matrix to generate suggestions (Collaborative Filtering).

Would you like to try "Design a Food Delivery App (Swiggy/Zomato)" or "Design a Payment Gateway" next?

Online Book Management System

An end-to-end Low Level Design (LLD) for an Online Book Management System (Library Management System).

This is a classic Object-Oriented Design (OOD) question. The main challenge is distinguishing between the Metadata (The abstract concept of the book "Harry Potter") and the Inventory (The 5 physical copies of "Harry Potter" on the shelf).


1. Requirements Clarification

  • Actors:

    • Librarian (Admin): Add/Remove books, Block members.

    • Member: Search, Borrow, Return, Reserve books.

    • System: Notifications, Fine calculation.

  • Core Use Cases:

    • Search: By Title, Author, or Category.

    • Check-out (Borrow): Validate eligibility (max limit, unpaid fines), decrement inventory.

    • Return: Update inventory, calculate overdue fines.

  • Constraints:

    • Each book can have multiple copies (Book Items).

    • Max 5 books per member.

    • Max lending duration: 10 days.


2. Core Entities & Relationships

  1. Book (Metadata): Details like ISBN, Title, Author. Shared by all copies.

  2. BookItem (Physical Copy): Specific instance with a unique Barcode and Status (Available, Loaned, Lost).

  3. Account/Member: User details, list of checked-out items.

  4. Catalog (Search Index): Efficiently finds books.

  5. LibraryService (Facade): The main controller handling transactions.


3. Class Diagram

Code snippet


4. Python Implementation

A. The Models (Book vs BookItem)

This separation is the most important part of the design.

Python

B. The User (Account & Member)

Python

C. Search Catalog (Index)

A simple list search is . A proper design uses HashMaps for lookup.

Python

D. The System (Facade & Fine Logic)

This manages the business rules.

Python

E. Driver Code

Python


5. Key Interview Nuances

1. Fine Calculation (Strategy Pattern)

  • Question: What if books have different fine rates (e.g., Rare books = $10/day, Magazines = $0.50/day)?

  • Design: Don't hardcode _calculate_fine. Use the Strategy Pattern.

    • Interface FineStrategy with method calculate(days).

    • Strategies: StandardFine, PremiumFine.

    • The BookItem holds a reference to its specific strategy.

2. Concurrency

  • Question: Two users scan the QR code of the same book item at the exact same millisecond via the mobile app.

  • Solution: You need locking.

    • DB Level: SELECT * FROM book_item WHERE barcode = 'xyz' FOR UPDATE; (Row-level locking).

    • App Level: Python threading.Lock() inside the borrow_book method for that specific barcode.

3. Notification System (Observer Pattern)

  • Question: Notify user when a reserved book becomes available.

  • Design: Implement Observer Pattern.

    • User subscribes to BookMetadata.

    • When BookItem status changes from LOANED to AVAILABLE, the System checks the Reservation Queue and notifies the next observer (User).

4. Barcode Reading

  • In a machine coding round, mentioning that barcode is the unique identifier (Primary Key) for BookItem while ISBN is the identifier for Book is a strong signal of domain knowledge.

Would you like to explore "Design a Vending Machine" (State Pattern) or "Design a Snake & Ladder Game" next?

Lift

Here is a detailed Low Level Design (LLD) for an Elevator (Lift) System.

This is a very popular Machine Coding problem because it tests State Management, Scheduling Algorithms, and Concurrency.


1. Requirements Clarification

  • Setup: Building with Floors (0 to ). Elevators.

  • Capacity: Each elevator has a max weight/person limit.

  • Buttons:

    • Inside: Panel to select specific floors (1, 2, 5...).

    • Outside (Hall): Up/Down buttons on each floor.

  • Behavior:

    • The elevator should not change direction violently. It should finish requests in the current direction before switching (The SCAN Algorithm).

    • Example: If going UP from 1 to 5, and a user at 3 presses UP, stop at 3. If a user at 4 presses DOWN, ignore them until the elevator reaches the top and starts coming down.


2. Core Algorithm: The SCAN (Look) Algorithm

Naive First-Come-First-Serve (FCFS) is bad for elevators (it moves like a yo-yo).

We use the SCAN Algorithm:

  1. Move in current_direction.

  2. Stop at all requested floors in that direction.

  3. If no more requests in that direction, check pending requests in the opposite direction.

  4. Reverse direction.

Data Structure: Two Priority Queues.

  • Up Queue (Min-Heap): Sorts stops in ascending order (1 2 5).

  • Down Queue (Max-Heap): Sorts stops in descending order (10 8 3).


3. Class Diagram

Code snippet


4. Python Implementation

A. Enums and Request

Python

B. The Elevator (The Worker)

This is where the complex state logic lives. We use two heaps to manage the path.

Python

C. The Controller (Dispatcher)

This decides which elevator gets the external "Hall Call".

Python

D. Driver Code

Python


5. Key Design Challenges & Interview Answers

1. Dispatch Algorithm Strategy

  • Question: How do you decide which elevator to send?

  • Approaches:

    • Nearest Idle: Simple but inefficient during rush hour.

    • Sectoring: Divide building (Elevator A serves 0-10, B serves 11-20). Good for skyscrapers.

    • Destination Dispatch (Modern): Users type destination floor in the lobby before entering. The system groups people going to the same floor into the same elevator. This drastically reduces stops.

2. State Pattern

  • Question: The move() method has many if-else checks. Can we improve this?

  • Answer: Use the State Design Pattern.

    • Classes: IdleState, MovingUpState, MovingDownState.

    • Each state implements move() and handle_request() differently.

    • Example: MovingUpState.move() simply increments floor, while IdleState.handle_request() transitions the state to Moving.

3. Thread Safety

  • Question: add_request can happen while move is running.

  • Answer: Access to up_stops and down_stops (the priority queues) must be synchronized.

    • In Java: synchronized methods.

    • In Python: with self.lock: around the heap operations.

4. Handling "Emergency Stop"

  • We would add a boolean is_operational. If False, the dispatcher ignores this elevator.

  • If FireAlarm triggers, ElevatorSystem overrides all queues: clear up_stops, push 0 (Lobby) to down_stops, and force direction DOWN.

5. Why Priority Queues?

  • Using a simple List/Array requires sorting every time a button is pressed ().

  • A Priority Queue (Heap) allows insertion in and always gives us the next closest floor in .

Would you like to move on to "Design a Snake and Ladder Game" or "Design a Parking Lot"?

Last updated