Advanced
AWS S3 Service
An end-to-end Low Level Design (LLD) for a simplified Object Storage Service (like AWS S3).
This problem is distinct from designing a standard file system because S3 is a Key-Value Store for Blobs. It decouples the Metadata (File info) from the Block Storage (Actual bytes).
1. Requirements Clarification
Core Entities:
Bucket: A container for objects. Names must be globally unique.
Object: The file itself (immutable content). Identified by a Key.
Operations:
create_bucket(name)put_object(bucket, key, data)get_object(bucket, key)delete_object(bucket, key)
Key Behavior:
Flat Structure: S3 does not have real folders. "photos/2023/vacation.jpg" is just a long string key.
Immutability: You cannot edit a byte in the middle of a file. You must re-upload the whole object.
2. Core Architecture: Metadata vs. Data
To design this correctly, we must separate concerns:
Metadata Store: A database (e.g., DynamoDB/Cassandra) holding attributes:
{Key: "abc.jpg", Size: 2MB, StoragePath: "/disk1/block_99"}.Blob Store: The physical storage engine (e.g., HDD/SSD) that saves the raw bytes.
3. Class Diagram
We use the Strategy Pattern for the Storage Engine to allow swapping implementations (e.g., Local Disk vs. Distributed File System).
Code snippet

4. Python Implementation
A. Storage Backend (The Strategy)
This handles the raw bytes. In a real system, this would talk to a block server. Here, we simulate it with a local directory.
Python
B. The Entities (Bucket & Object)
Python
C. The Service (Facade)
This coordinates the Metadata Store and the Blob Store.
Python
D. Driver Code
Python
5. Key Design Decisions & Interview Questions
1. Handling Large Files (Multipart Upload)
Question: How do you upload a 50GB file? You can't hold it all in RAM.
Design: Break the file into chunks (e.g., 10MB parts).
API:
initiate_multipart_upload(),upload_part(part_id, data),complete_upload().The
S3Objectmetadata will store a list of chunk IDs:[chunk1_uuid, chunk2_uuid, ...].get_objectreconstructs the file by reading chunks in order.
2. Logical vs Physical Deletion
Question: When I overwrite a file, what happens to the old data?
Answer: In the code above, the old data is orphaned on the disk (if we didn't explicitly delete it).
Real S3 Approach: S3 often uses a Garbage Collector process. It marks metadata as "Deleted" instantly, but a background process scans for physical blobs that have no metadata pointing to them and deletes them later (Async deletion).
3. Folder Structure
Question: Does S3 have folders?
Answer: No. It's a flat key-value store.
UI Illusion: The console shows folders by splitting keys by the
/delimiter.Implication: Renaming a "folder" is expensive. To rename
photos/toimages/, S3 must copy every single object to the new key prefix and delete the old ones. It's an operation.
4. Versioning
Feature: S3 allows keeping history.
Implementation: The
objectsmap inBucketchanges fromKey -> ObjecttoKey -> List[Object].put_objectappends to the list instead of overwriting.get_objectreturns the last item in the list (Current Version).
5. Distributed Scalability (System Design touch)
Metadata: Because bucket listings can be huge, metadata is usually sharded in a distributed DB (like DynamoDB or a specialized LSM-Tree engine).
Data: The
storage_pathuses Consistent Hashing to distribute bytes across thousands of servers.
Would you like to move on to "Design a Traffic Control Signal System" or "Design a Cache (LRU)"?
Google Drive
An end-to-end Low Level Design (LLD) for a Search Engine (like a mini-Google).
Disclaimer: Designing "Google" is typically a System Design (HLD) question due to its scale. For Machine Coding (LLD), the interviewer expects you to build the core engine: The Inverted Index and the Ranking Algorithm.
1. Requirements Clarification
Goal: Ingest documents (web pages) and allow users to search for them using keywords.
Input: A list of "Web Pages" (Strings with ID and Content).
Output: A list of relevant page IDs sorted by relevance (Ranking).
Core Components:
Crawler: (Simulated) Adds documents to the system.
Indexer: Processing text to make it searchable.
Query Processor: Parsing user input.
Ranker: Sorting results (e.g., using Frequency).
2. Core Concept: The Inverted Index
This is the heart of any search engine.
Forward Index: Document ID List of words. (Great for storage, bad for search).
Inverted Index: Word List of Document IDs. (Great for search).
Example:
Doc1: "Apple banana"
Doc2: "Banana cherry"
Inverted Index:
"apple" [Doc1]
"banana" [Doc1, Doc2]
"cherry" [Doc2]
3. Class Diagram
We will use a Pipeline Architecture for ingestion and a Service for retrieval.
Code snippet

4. Python Implementation
A. The Data Structures (Document & Posting)
The Posting class is crucial. It doesn't just store "Doc ID", it stores metadata (frequency, position) needed for ranking.
Python
B. The Indexer (The Engine)
This builds the Inverted Index.
Python
C. The Search Service (Orchestrator)
This handles the user query, intersection of results, and ranking.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. AND vs OR Search
Question: How do you handle "Apple AND Banana"?
Answer: Instead of taking the Union (
set.add) of document IDs found, you take the Intersection.List 1 (Apple): [Doc1, Doc2]
List 2 (Banana): [Doc3]
Intersection: [] (No results).
2. Ranking Algorithm (TF-IDF)
Question: Counting frequency is too simple. If I search "the apple", "the" appears 100 times, "apple" appears once. A doc with 100 "the"s will outrank a doc with 5 "apple"s.
Solution: Use TF-IDF (Term Frequency - Inverse Document Frequency).
TF: How often word appears in this doc. (High for "the" and "apple").
IDF: How rare is the word across all docs?
"The" is in every doc Low Score.
"Apple" is rare High Score.
We multiply TF * IDF. "The" gets penalized effectively.
3. Typeahead / Autocomplete
Question: How do you design the dropdown suggestions?
Structure: Use a Trie (Prefix Tree).
Optimization: Store the "Top 5 searched queries" at each node of the Trie so you don't have to traverse to the bottom to find suggestions.
4. Concurrency
Question: What if thousands of crawlers are adding documents while users are searching?
Design:
The Indexer shouldn't block the Searcher.
Use a Read-Write Lock.
Or, use Double Buffering (Shadow Indexing): Search queries go to
Index_A(ReadOnly). The Crawler updatesIndex_B. OnceIndex_Bis ready, we atomically swap them.
5. Scalability (Sharding)
The index is too big for one machine.
Document Partitioning (Sharding by DocID): Each machine holds a subset of documents (e.g., Doc 1-1M).
Search: Broadcast query to ALL shards, merge results. (Parallel processing).
Term Partitioning (Sharding by Word): Each machine holds a subset of words (e.g., A-C).
Search: Query "Apple" goes to Shard 1.
Problem: Multi-word queries ("Apple Banana") require hitting multiple shards and transferring large lists over the network. Document partitioning is usually preferred.
Would you like to move on to "Design a Rate Limiter" or "Design a Task Scheduler"?
Version Control (GitHub) - Database Modelling
An end-to-end Low Level Design (LLD) for a Version Control System (like Git).
This problem tests your understanding of Graph Theory (Directed Acyclic Graph - DAG), Hashing, and Immutable Data Structures.
1. Requirements Clarification
Core Entities: Files, Commits, Branches.
Key Operations:
commit(message): Save the current state of files.create_branch(name): Create a new pointer to the current commit.switch_branch(name): Switch the working context.merge(branch_name): Merge changes from another branch (simplified).log(): Show history.
Constraints:
Store file contents efficiently (deduplication is a bonus).
Handle the history as a graph (DAG).
2. Core Architecture: The Git Object Model
To design this correctly, you must think like Git. Git is not a file system; it's a Content-Addressable File System.
Blob: Represents the content of a file. If two files have the same text, they point to the same Blob.
Commit: A snapshot. It points to the file state and a parent commit.
Ref (Branch): A mutable pointer to a specific Commit Hash (e.g.,
masterpoints toCommit A).HEAD: A pointer to the current Branch (or Commit).
3. Class Diagram
Code snippet

4. Python Implementation
We will use hashlib to generate SHA-1 hashes, mimicking Git's ID generation.
A. The Commit Node (The Snapshot)
In a real system, we would have Tree objects (folders) and Blob objects. For an interview, we can simplify: A commit stores a flat map of {filename: content_hash}.
Python
B. The Version Control System (The Engine)
This manages the graph pointers (HEAD, master, etc.).
Python
C. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Storage Efficiency (Deduplication)
Problem: If I have a 100MB file and I change 1 character,
Commit Ahas the old file andCommit Bhas the new file. Storing full copies is wasteful.Git's Solution (The Blob): Git stores files by the Hash of their content.
File1: "Hello" Hash
a1b2. Store Objecta1b2.Commit A points to
a1b2.If Commit B uses the same file "Hello", it also points to
a1b2. No duplicate storage.Interview Implementation: Use a separate
BlobStoredictionary{content_hash: content}. The Commit object only stores{filename: content_hash}.
2. Merge Conflicts
Question: What if both branches modified line 10 of
file1.txt?Logic:
Find the Common Ancestor (The latest commit reachable from both branches).
Diff
AncestorvsCurrentMyChanges.Diff
AncestorvsIncomingTheirChanges.If both touched the same line with different content Raise
ConflictException.The system stops and asks the user to manually edit the file and run
addagain.
3. Detached HEAD
Concept: Usually
HEADpoints to a Branch Name (master).Detached:
HEADpoints directly to a Commit Hash (a1b2c).Implication: If you commit now, the new commit has no branch pointing to it. If you switch away, you lose this commit (it becomes "garbage" and will be collected).
4. Differencing (Diffs vs Snapshots)
SVN/Old VCS: Stored "Deltas" (Change from V1 to V2). Slow to reconstruct the full file.
Git: Stores "Snapshots" (The full state). Fast to checkout, but requires smart compression (Packfiles) to save space. Stick to the Snapshot model for LLD interviews.
5. git revert vs git reset
git revert vs git resetRevert: Creates a new commit that undoes changes. (Safe for public branches).
Reset: Moves the branch pointer backward. (Destructive, dangerous for shared history).
Implementation:
resetis justbranches[current] = target_commit_id.revertis applying the reverse patch and callingcommit().
Mentorship Platform like Preplaced
An end-to-end Low Level Design (LLD) for a Mentorship Platform (like Preplaced, Topmate, or ADPList).
This problem focuses on Two-Sided Marketplace Dynamics (matching Supply and Demand), Calendar/Availability Management, and Booking State Machines.
1. Requirements Clarification
Actors:
Mentor: Sets availability, sets price/free sessions, approves requests.
Mentee: Searches for mentors (by skill/company), books slots.
System: Handles payments, scheduling, and notifications.
Core Use Cases:
Onboarding: Mentors create profiles with specific tags (e.g., "Google", "System Design", "Python").
Availability: Mentors define "Slots" (e.g., Saturday 10 AM - 12 PM).
Booking: Mentee selects a slot Block Slot Payment Confirmation.
Search: Filter mentors by Domain, Company, or Price.
2. Core Architecture: The Booking Flow
The most critical part is handling Availability. We cannot allow two mentees to book the same mentor at the same time.
Slot Definition: A granular unit of time (e.g., 30 mins).
Booking Lifecycle:
CREATEDPAYMENT_PENDINGCONFIRMEDCOMPLETED.
3. Class Diagram
Code snippet

4. Python Implementation
A. The Models (Users & Slots)
We treat Slot as a resource that must be locked during booking.
Python
B. The Search Engine (Discovery)
A simple in-memory filter. In production, this would be an Elasticsearch query.
Python
C. The Booking System (Transaction Management)
This handles the critical "Check-Lock-Book" flow.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Handling Timezones
Problem: Alice (Mentor) is in New York (EST). Bob (Mentee) is in India (IST).
Solution:
Backend: ALWAYS store times in UTC in the database (e.g.,
2023-10-01T14:00:00Z).Frontend: Convert UTC to the user's browser locale (
Intl.DateTimeFormat).Notification: Emails should say "Your session is at [Local Time]".
2. Integration with Google Calendar
Use Case: When a booking is confirmed, it should appear on both users' Google Calendars with a Meet link.
Design:
Use OAuth 2.0 to get write access to the user's calendar.
Upon
BookingStatus.CONFIRMED, trigger an async job (Kafka/Celery).Call Google Calendar API
events.insertwith the attendees' emails.Save the
conferenceData(Google Meet Link) back to the Booking entity.
3. Slot Generation (User Experience)
Problem: Mentors don't want to create 100 slots manually.
Solution: Implement Recurrence Rules.
UI: "I am available Every Saturday 10 AM - 2 PM".
Backend: Don't create infinite rows in the DB. Create "Availability Templates".
When a user views the profile for "Next Week", the system dynamically generates the "Virtual Slots" based on the template minus any existing bookings.
4. Ranking (Search Relevance)
Question: How do you decide which mentor appears first?
Factors:
Availability: Mentors with slots in the next 24h rank higher.
Response Rate: Mentors who reject requests rank lower.
Rating: Weighted average of last 20 reviews.
5. Dispute Resolution
Scenario: Mentee says "Mentor didn't show up". Mentor says "I was there".
Solution:
If using an integrated video platform (like internal WebRTC or Zoom API), log "Join Events".
Check logs: Did
MentorIDjoin roomBookingIDbetweenStartandEnd?If no logs (e.g., WhatsApp call), it requires manual Admin intervention (He said/She said). This is why platforms enforce on-platform calls.
Tinder Dating App
An end-to-end Low Level Design (LLD) for a Dating Application (like Tinder or Bumble).
This problem tests your ability to handle Geospatial Data (finding people near you) and Event Driven Actions (User A likes B + User B likes A Trigger Match).
1. Requirements Clarification
Actors: User.
Core Entities: Profile (Bio, Photos), Location (Lat/Long), Preferences (Gender, Age Range, Distance Radius).
Core Operations:
create_profile()update_location(lat, long)get_recommendations(): Fetch users who match my preferences and I haven't swiped on yet.swipe(user_id, direction): Right (Like) or Left (Pass).
The "Match" Condition: A match occurs if and only if Mutual Likes exist (A likes B AND B likes A).
2. Core Architecture: The "Double Opt-In"
The system relies on checking a "Reverse Lookup" upon every Like action.
Alice Likes Bob. System saves
Swipe(Alice -> Bob).System checks: Did Bob already like Alice?
No Do nothing (Silent).
Yes It's a Match! Create Match object and notify both.
3. Class Diagram
Code snippet

4. Python Implementation
A. Core Models (User, Location)
We need a robust way to calculate distance.
Python
B. The Recommendation Engine
This filters the "Feed". In a real system, this uses a QuadTree or Geohash. For LLD, iterating with filters is acceptable logic, provided you explain the scalability limits.
Python
C. The Service (Swipe Logic)
This is the brain of the application.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Spatial Indexing (The "Near Me" Query)
Problem: Iterating through
all_usersto calculate distance is . If you have 10 million users, the feed fails.Solution: Use Geohashing or a QuadTree.
Geohash: Divides the world into a grid of strings (e.g., "u4pru..."). Users sharing the same prefix are close to each other.
DB Query:
SELECT * FROM Users WHERE geohash LIKE 'u4p%'. This reduces search space from 10M to 1000.
2. Sharding Strategies
Question: How do you shard the database?
Answer: Sharding by Location (Geosharding) is intuitive but risky (NYC is overloaded, Kansas is empty).
Better Approach: Shard by User ID for profile data. Store the "Location Index" (Geohash -> UserIDs) in a separate, specialized service (like Redis Geo or Elasticsearch).
3. "Super Like" Feature
Design: A "Super Like" is just a
SwipeType.SUPER_LIKE.Logic: It bypasses the "Mutual" requirement for visibility. If Alice Super Likes Bob, when Bob opens the app, the engine explicitly injects Alice at the top of Bob's feed with a special flag
is_super_like=True.
4. Unlimited vs Free Swipes
Design: Decorator Pattern or Proxy Pattern on the
swipemethod.Check
user.subscription_tier.If
Free, checkdaily_swipe_count < 10.If
Gold, allow infinite.
5. Profile Consistency (The "Ghost" Problem)
Problem: Alice swipes right on Bob. 1 second later, Bob deletes his account. Alice matches with a ghost.
Solution: Strong consistency is hard. Eventual consistency is better.
When Bob deletes account, publish
UserDeletedEvent.Swipe Service listens and removes Bob from all
likessets.During the
_create_matchstep, perform a final "Live Check" (if user.is_active) before confirming.
Next Steps
Would you like to explore "Design Splitwise" (Expense Sharing) or "Design a Stock Brokerage (Zerodha)"?
WhatsApp Messenger
An end-to-end Low Level Design (LLD) for a Chat Application (like WhatsApp/Telegram).
This problem focuses on Real-Time Communication (WebSockets), Data Persistence (Chat History), and the State Management of messages (Sent Delivered Read).
1. Requirements Clarification
Core Features:
1:1 Chat: Send text messages between two users.
Group Chat: Broadcast messages to multiple users.
Receipts: Manage status updates:
SENT(1 tick),DELIVERED(2 ticks),READ(Blue ticks).User Status: Online/Last Seen.
Constraints:
Real-time: Low latency is crucial.
Offline Handling: If a user is offline, store the message and deliver when they reconnect.
2. Core Architecture: The WebSocket Gateway
Unlike standard REST APIs (Request-Response), chat apps use Full-Duplex Communication (WebSockets or MQTT). The server pushes messages to the client without the client asking.
Connection Map: The server must maintain a mapping:
UserID -> WebSocketConnection.Pub/Sub (For Scale): If User A is connected to Server 1 and User B is connected to Server 2, we need a Pub/Sub mechanism (Redis/Kafka) to route the message between servers.
3. Class Diagram
Code snippet

4. Python Implementation
A. The Models
We need a robust Message class that tracks who sent what to whom.
Python
B. The Connection Manager (The Socket Layer)
This mimics the WebSocket server. In a real implementation (e.g., using socket.io or websockets library), this holds the actual TCP socket objects.
Python
C. The Chat Service (Business Logic)
This handles 1:1 and Group logic, plus the "Store and Forward" mechanism for offline users.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Protocol: HTTP vs WebSocket
Question: Why can't we just use HTTP POST?
Answer: HTTP is request-initiated. To get new messages, the client would have to Poll (
setIntervalevery 1s). This kills battery and increases server load.Solution: WebSockets provide a persistent connection. The server can push data anytime.
2. Message Ordering (The "Clock" Problem)
Problem: In a distributed system, Alice might send "Msg A" then "Msg B", but Bob receives "Msg B" first due to network lag.
Solution: Use Sequence Numbers (Logical Clocks) or timestamps. The Client App is responsible for re-ordering the messages in the UI buffer based on
timestampbefore displaying them.
3. End-to-End Encryption (E2EE)
Question: Does the server read the messages?
Answer: In WhatsApp, no.
Design: The LLD
send_messagemethod doesn't send plain text.Alice generates a Symmetric Key.
Encrypts:
EncryptedMsg = Encrypt(Message, Key).Encrypts Key:
EncryptedKey = Encrypt(Key, Bob_Public_Key).Server routes the encrypted blob. It cannot decrypt it.
4. "Last Seen" Scalability
Problem: If I have 500 friends, and I go online, do I notify all 500 immediately? That's a "Thundering Herd".
Solution: Lazy Fetching or Subscription.
I only subscribe to the presence updates of people whose chat window I currently have open.
For the contact list, we poll infrequently (e.g., every 30s) or only update when scrolling into view.
5. Media Handling (Images/Video)
Do not send binary image data over the WebSocket. It blocks the chat channel.
Flow:
Client uploads Image to Object Storage (S3) via HTTP API.
Gets URL:
https://s3.aws.com/img_123.jpg.Sends the URL (Text) via WebSocket as a normal message.
Receiver downloads the image from the URL.
Next Step
Would you like to solve "Design a Code Deployment System (CI/CD Pipeline)" or "Design a Splitwise (Expense Sharing)" system next?
Gmail
An end-to-end Low Level Design (LLD) for an Email Service (like Gmail).
This problem tests your understanding of Data Modeling (Labels vs. Folders), Search (indexing text), and State Management (Draft Sent Inbox).
1. Requirements Clarification
Core Features:
Compose/Send: Send emails to one or multiple recipients (To, CC, BCC).
Inbox management: Receive emails.
Organization: Apply Labels (Work, Personal) and standard folders (Inbox, Sent, Trash).
Search: Find emails by subject or sender.
Drafts: Save unfinished emails automatically.
Key Distinction (Gmail Specific):
Gmail uses Labels, not Folders.
In a "Folder" system (Outlook), an email exists in one location.
In a "Label" system (Gmail), an email is a single entity that can have multiple tags (e.g., "Inbox", "Work", "Urgent" all at once).
2. Core Architecture: The Label System
We need to decouple the Email Content from its Organization.
Email Storage: A global collection of unique Email objects.
User View: Users don't "hold" the email object; they hold a reference to it associated with a Label.
Sending Flow:
Create Email Object.
Add to Sender's "Sent" Label.
Add to Receiver's "Inbox" Label.
3. Class Diagram
Code snippet

4. Python Implementation
A. The Models
We need to handle the nuances of BCC (Blind Carbon Copy) — the recipients list stored in the email object usually hides BCC, but for this LLD, we store it to ensure delivery, then mask it during display.
Python
B. The Mailbox (User's View)
This class manages how a specific user sees their emails. It maps Label -> List[Email].
Python
C. The Service (Controller)
This handles the transaction of moving an email from Sender to Receiver.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Protocols (SMTP vs IMAP)
Question: How does the email actually travel?
Answer:
SMTP (Simple Mail Transfer Protocol): Used for Sending. It pushes data from Alice's Server Bob's Server.
IMAP/POP3: Used for Retrieving. It pulls data from Bob's Server Bob's Phone/Laptop.
LLD Context: Our
GmailService.send_emailmimics the SMTP delivery agent, andget_emailsmimics the IMAP fetch command.
2. Spam Filters (Chain of Responsibility)
Design: Before adding an email to the
Inbox, pass it through a filter pipeline.Pattern: Chain of Responsibility.
VirusScannerSpamKeywordFilterBlacklistFilterInbox.If any filter flags it, divert to
Spamlabel instead ofInbox.
3. Attachment Handling
Constraint: You cannot store 25MB files in a MySQL/Postgres row.
Solution:
DB: Stores Metadata (filename, size, S3_Key).
Object Store (S3): Stores the actual binary data.
When an email is sent, the attachment is uploaded to S3 first, and the URL is embedded in the email object.
4. Search Scalability (Inverted Index)
Question:
query in body.lower()is . It's too slow for 10GB of text.Solution: Use an Inverted Index (Lucene/Elasticsearch).
When email arrives: Tokenize body ("Project", "Live", "Team").
Update Index:
"project" -> [Email_ID_1, Email_ID_5].Search is now .
5. Threading (Conversation View)
Feature: Gmail groups replies together.
LLD: The
Emailobject needs aparent_email_idorthread_id.Logic: When Bob replies to Alice:
ReplyEmail.thread_id = OriginalEmail.thread_id.The UI groups all emails with the same
thread_id.
Would you like to try "Design a Logger Library" or "Design an Airline Reservation System" next?
Game Engine like Unreal
An end-to-end Low Level Design (LLD) for a Game Engine (like a simplified Unity or Unreal Engine).
Designing a full engine is massive. For Machine Coding, the focus is strictly on the Core Architecture Pattern: The Entity-Component-System (ECS) or the Composition Pattern.
Traditional OOP (Inheritance) fails in games (e.g., class FlyingEnemy extends Enemy works, but what about FlyingSwimmingEnemy?). ECS solves this by using Composition over Inheritance.
1. Requirements Clarification
Core Entities:
GameObject (Entity): A generic container (e.g., Player, Tree, Camera).
Component: Data or Logic attached to an object (e.g.,
Transform,RigidBody,MeshRenderer).
Core Systems:
Scene Management: Holding the hierarchy of objects.
Game Loop: The heartbeat (Input Update Render).
Physics/Rendering: Subsystems that act on specific components.
Goal: Create a Mario-like object that has position, can move, and can be drawn.
2. Core Architecture: Entity-Component (EC)
Entity: Just an ID or a list of Components. It doesn't know what it is.
Component: Pure data (or logic in simpler Unity-style designs).
System: Iterates over entities that have specific components and performs actions.
3. Class Diagram
Code snippet

4. Python Implementation
A. The Core (Entity & Component)
This is the foundation. Every object in the game will inherit from or use these classes.
Python
B. Standard Components
These are the building blocks developers would use to make a game.
Python
C. The Game Engine (Loop & Scene)
This orchestrates the lifecycle.
Python
D. Driver Code (User creating a Game)
Python
5. Key Design Challenges & Interview Answers
1. ECS vs OOP
Question: Why not just make a
Playerclass?Answer:
OOP: If
Playerinherits fromCharacter, andBoatinherits fromVehicle, how do you make aPlayerdrive aBoat? You end up with deep, messy inheritance trees.ECS: A
Playeris just an ID with aControllerComponent. ABoatis an ID with aBuoyancyComponent. To drive, you simply parent the Player Entity to the Boat Entity. It's flexible and modular.
2. Update Order (The Dependency Problem)
Problem: The Camera follows the Player. If
Camera.update()runs beforePlayer.update(), the camera will jitter because it's tracking the old position.Solution: Introduce Phases.
Update(): For Logic/Input.LateUpdate(): For Camera/UI (Runs after all Updates are done).FixedUpdate(): For Physics (Runs at constant time intervals, e.g., 0.02s).
3. Resource Management (Flyweight Pattern)
Problem: If I have 1,000 trees, and each loads "tree.png" (5MB) into RAM, I crash memory.
Solution: ResourceManager.
The
SpriteRendererasksResourceManager.get_texture("tree.png").The Manager checks a Map. If loaded, return reference. If not, load from disk.
All 1,000 trees share the same single Texture object in memory.
4. Collision Detection (Spatial Partitioning)
Problem: Checking if Mario collides with every object in the world is .
Solution: Use a QuadTree (2D) or Octree (3D).
Only check collisions with objects in the same grid cell/node.
5. Serialization (Save/Load)
Because we used Components (Data), saving is easy.
Loop through all GameObjects Loop through all Components Serialize to JSON.
{ "id": "mario", "components": [ {"type": "Transform", "x": 10}, {"type": "RigidBody", "mass": 10} ] }.
Real-Time Chat System with Millions of Users
Here is a detailed Low Level Design (LLD) for a Real-Time Chat System optimized for High Scalability (Millions of Users).
While a basic chat app can run on one server, a system for millions of users requires a distributed architecture. The core challenge is that WebSockets are stateful. If User A is connected to Server 1 and User B is connected to Server 2, Server 1 cannot directly send a message to User B.
1. Requirements Clarification
Scale: Support millions of concurrent connections.
Latency: Real-time delivery (< 100ms).
Features:
1:1 Chat & Group Chat.
Cross-Server Communication (User A on Node 1 User B on Node 2).
Persistent History (scalable storage).
Presence (Online/Offline status).
2. Core Architecture: The "Gateway + Pub/Sub" Pattern
To solve the stateful server problem, we use a Pub/Sub (Publish-Subscribe) mechanism (like Redis or Kafka).
WebSocket Gateway: A cluster of servers holding active TCP connections.
Session Store (Redis): Maps
UserIDGatewayID. (Where is User A connected?).Pub/Sub: Used to route messages between Gateways.
The Flow:
User A sends message to User B.
Gateway 1 looks up User B in Session Store. Result: "User B is on Gateway 5".
Gateway 1 publishes message to Topic: Gateway_5.
Gateway 5 receives the message and pushes it down the WebSocket to User B.
3. Class Diagram
Code snippet

4. Python Implementation
We will simulate a distributed system by creating multiple instances of WebSocketGateway and a mock RedisPubSub.
A. Infrastructure Mocks (Redis & Session Store)
Python
B. The Gateway Node (The Server)
This represents one physical server in the fleet. It handles connections and listens to the Pub/Sub for messages destined for its connected users.
Python
C. Driver Code (Simulating Distributed Cluster)
Python
5. Key Design Decisions for Scale
1. Database Schema (NoSQL is mandatory)
RDBMS (MySQL) cannot handle the write throughput of millions of messages. We use Cassandra or DynamoDB.
Schema Design (Cassandra):
We prioritize Reading messages for a specific chat.
Partition Key:
channel_id(orchat_id) -> Keeps all messages for a chat on one node.Clustering Key:
message_id(TimeUUID) -> Sorts messages by time.
SQL
2. Group Chat "Fan-out" (The Instagram Problem)
When a celebrity sends a message to a group with 1 million members:
Simple Loop: Iterating 1 million times to publish 1 million WebSocket frames will choke the server.
Solution: Batched Fan-out Service.
The Gateway sends the message to a Kafka topic
GroupMessages.A fleet of "Push Workers" consumes the topic.
Worker 1 handles users A-M, Worker 2 handles N-Z, etc.
3. Presence System (Heartbeats)
Problem: Updating the DB every time a user disconnects is expensive.
Solution: Redis with TTL (Time To Live).
Client sends a "Heartbeat" packet every 30 seconds.
Server does
Redis.set(user_id, "Online", TTL=40s).If the heartbeat stops (app crash, internet loss), the key expires automatically after 40s. User appears "Offline".
4. Media Handling
Chat servers never handle binary files (Images/Videos).
Flow:
Client uploads image to S3 (Object Storage).
Client gets URL.
Client sends text message containing the URL.
Receiver downloads from S3.
5. ID Generation
We cannot use auto-increment IDs (MySQL) in a distributed system.
Use Snowflake ID (Twitter's approach) or UUID v1 (Time-based) to generate sortable, unique 64-bit integers across distributed nodes.
Design (LLD) Video Conferencing App like Zoom
An end-to-end Low Level Design (LLD) for a Video Conferencing Application (like Zoom, Google Meet, or Teams).
Disclaimer: In a Machine Coding round, you are not expected to write video compression algorithms (H.264) or implement raw socket programming for video streams. You are expected to design the Control Plane (Signaling, Meeting Management) and explain the Data Plane (how video flows).
1. Requirements Clarification
Core Entities:
Meeting (Room): A virtual space with a unique ID.
Participant: A user with permissions (Host/Guest).
Stream: Audio/Video tracks.
Operations:
create_meeting()join_meeting(meeting_id)leave_meeting()mute_audio()/stop_video()
Technical Constraint:
Real-time: Latency must be minimal (< 200ms).
Topology: How do 10 people see each other? (We will discuss Mesh vs. SFU).
2. Core Architecture: WebRTC & Signaling
Video chat relies on WebRTC. However, WebRTC needs a Signaling Server to set up the connection.
Signaling (The Handshake): Before Alice and Bob can send video, they must exchange metadata via a server (HTTP/WebSocket):
"I am Alice, here is my IP address" (ICE Candidates).
"I support HD video" (SDP - Session Description Protocol).
Media (The Stream): Once connected, video packets usually flow via UDP.
P2P (Mesh): Every client connects to every other client. (Bad for >3 users).
SFU (Selective Forwarding Unit): Everyone sends video to a central server; the server forwards it to others. (Standard for Zoom).
3. Class Diagram
We will focus on the Signaling and Meeting Management layer.
Code snippet

4. Python Implementation
A. The Models (Meeting & Participant)
This manages the state of the conference.
Python
B. The Signaling Service (The Traffic Cop)
This simulates the WebSocket server that passes messages between clients.
Python
C. Driver Code (Simulating the Flow)
Python
5. Key Design Challenges & Interview Answers
1. TCP vs. UDP
Question: Why does video lag if we use TCP?
Answer: TCP guarantees delivery (reliability). If Packet 5 is lost, TCP pauses Packet 6, 7, 8 until 5 is re-sent. In a live call, this causes buffering (freezing).
Solution: Use UDP. If Packet 5 is lost, ignore it and show Packet 6. It's better to have a slight glitch in the image than to freeze the entire stream.
2. Architecture: Mesh vs. MCU vs. SFU
Mesh (Peer-to-Peer): * Everyone connects to everyone.
N users = connections.
Pros: Cheap (no media server). Cons: Clients crash if > 4 users (Upload bandwidth saturation).
MCU (Multipoint Control Unit): * Server decodes everyone's video, mixes it into one video stream, and sends it out.
Pros: Client is happy (receives only 1 stream). Cons: Server CPU is on fire (Decoding/Encoding is expensive).
SFU (Selective Forwarding Unit) - The Winner:
Server acts as a smart router. It receives Alice's stream and forwards it to Bob, Charlie, and Dave without decoding it.
Pros: Scalable (Zoom uses this).
3. Adaptive Bitrate (Simulcast)
Problem: Alice has fast internet (4K video), Bob has slow 3G internet. If Alice sends 4K, Bob chokes.
Solution: Alice sends Simulcast (Three versions of her video: High, Med, Low quality) to the SFU.
The SFU checks Bob's bandwidth and forwards the Low quality packet to Bob.
The SFU sends the High quality packet to Charlie (who has fiber).
4. Handling Jitter
Problem: Packets arrive out of order (1, 3, 2, 4) due to UDP.
Solution: Jitter Buffer. The client waits a small amount of time (e.g., 50ms) to collect packets and reorders them before playing the video frame.
5. Security (E2EE)
WebRTC is encrypted by default (DTLS/SRTP).
However, in an SFU/MCU, the server usually decrypts the header to route packets. True End-to-End Encryption (where the server sees nothing) requires advanced key management (Insertable Streams).
Cryptocurrency Exchange Platform
An end-to-end Low Level Design (LLD) for a Cryptocurrency Exchange (like Binance, Coinbase, or Luno).
This problem is a classic example of designing a High-Frequency Trading Engine. The core complexity lies in the Matching Algorithm (matching Buy orders with Sell orders) and managing Concurrency (ensuring two people don't sell the same coin).
1. Requirements Clarification
Core Entities: Users, Wallets, Assets (BTC, ETH, USD).
Core Operations:
Limit Order: Buy/Sell at a specific price.
Market Order: Buy/Sell immediately at the best available price.
Cancel Order: Remove an active order.
The Matching Engine:
Price-Time Priority: Orders with better prices match first. If prices are equal, the earlier order matches first.
Partial Fills: If I want 10 BTC but only 2 are available, I buy 2 and keep waiting for 8.
2. Core Architecture: The Order Book
The heart of an exchange is the Order Book. It maintains two lists for every trading pair (e.g., BTC/USD):
Bids (Buys): People wanting to buy. Sorted by Price DESC (Highest bidder first).
Asks (Sells): People wanting to sell. Sorted by Price ASC (Lowest seller first).
Data Structure Choice:
Priority Queue (Heap): ideal for matching because we always need the "Best" price ( access).
Tree Map (Balanced BST): Better if we need random access cancellation.
For this interview, we use Heaps for the Matching Engine efficiency.
3. Class Diagram
Code snippet

4. Python Implementation
A. The Order Model
We implement comparison methods (__lt__) so the Heap knows how to sort orders automatically.
Crucial Logic:
Sell Orders: Lower price is better. (Sort Ascending).
Buy Orders: Higher price is better. (Sort Descending).
Python
B. The Matching Engine (Order Book)
This is the complex algorithmic part.
Python
C. The Exchange Service (Facade)
This manages multiple trading pairs and user validation.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Floating Point Errors (The "Superman 3" Problem)
Problem: If you calculate
0.1 + 0.2in Python/Java floats, you often get0.30000000000000004. In finance, this creates phantom money or prevents trade matching.Solution: ALWAYS use
Decimaltypes (Pythondecimal, JavaBigDecimal). Never usefloatordoublefor currency.
2. Concurrency (Race Conditions)
Problem: What if two threads try to match the same "Best Sell Order" at the same time?
Solution:
Single-Threaded Matching Engine: Most high-performance exchanges (like LMAX Disruptor) use a Single Thread for the matching core to avoid locking overhead. It's faster to queue inputs into a single processor than to manage locks.
Sharding: One thread per Trading Pair. (Thread A handles BTC/USD, Thread B handles ETH/USD). They don't overlap.
3. Market Orders
Question: How to handle "Buy 1 BTC at Any Price"?
Logic: A Market Order is treated as a Limit Order with a price of
Infinity(for Buy) or0(for Sell). It crosses the spread immediately until the quantity is filled.
4. Cancellation Efficiency
Problem:
heapqis good for popping the top, but deleting an item from the middle (user cancels order) is .Solution: Lazy Deletion.
Don't remove it from the heap immediately.
Mark the order object as
status = CANCELLED.When the heap pops the top during matching, if it sees a
CANCELLEDorder, it discards it and pops again.
5. Auditability (Event Sourcing)
An exchange cannot lose data.
Instead of just updating
UserBalance = 50, we record a ledger of events:OrderPlaced(ID: 1)TradeExecuted(ID: 1, Price: 50k)BalanceDeducted(User: A, Amount: 1 BTC)
This allows replaying history to fix bugs or audit fraud.
Collaborative Document Editing (Google Docs)
An end-to-end Low Level Design (LLD) for a Collaborative Document Editor (like Google Docs or Etherpad).
This is widely considered one of the hardest LLD problems because it requires handling Concurrency Conflicts in real-time. The standard solution involves Operational Transformation (OT) or CRDTs (Conflict-Free Replicated Data Types).
1. Requirements Clarification
Core Entities: Document, User, Session.
Core Operations:
insert(index, char): User types a character.delete(index): User deletes a character.
The Conflict Problem:
Initial Text: "AB"
User 1: Inserts 'X' at index 1 ("AXB").
User 2: Deletes index 1 ('B') at the same time ("A").
Result: Without special logic, User 1's "X" might overwrite User 2's deletion, or the document becomes corrupted.
Goal: Eventual Consistency (Everyone sees the same text).
2. Core Architecture: Operational Transformation (OT)
We cannot send the whole document text on every keypress (too slow). We send Operations.
The Server acts as the Single Source of Truth.
Optimistic UI: When you type, the character appears immediately on your screen.
Transformation: If the server receives an operation from User B while you were typing, the server transforms User B's operation so it makes sense in the context of your changes, and vice versa.
The "Transform" Logic:
If OpA inserts at index 5, and OpB inserts at index 10:
OpA does not affect OpB.
If OpA inserts at index 5, and OpB inserts at index 2:
OpAmust now insert at index 6 (becauseOpBadded a char before it).
3. Class Diagram
Code snippet

4. Python Implementation (Simplified OT)
We will implement a simplified OT Engine that handles INSERT vs INSERT conflicts.
A. The Operation Model
Python
B. The Transformation Engine
This is the most critical function. It takes two concurrent operations (op_new and op_previous) and adjusts op_new as if op_previous had happened first.
Python
C. The Document Server
The server applies operations strictly in order and keeps a history to transform incoming "stale" operations.
Python
D. Driver Code (Simulation)
We simulate a race condition where Alice and Bob act on the same starting state before seeing each other's changes.
Python
5. Key Design Challenges & Interview Answers
1. OT vs. CRDT (The Big Debate)
Operational Transformation (OT): Used by Google Docs.
Pros: History-based, intuitive for text.
Cons: Extremely complex to implement correctly (requires a transformation function for every pair of operations: Insert vs Delete, Delete vs Insert, etc.). Needs a central server.
CRDT (Conflict-Free Replicated Data Types): Used by modern editors (Zed, Atom Teletype).
Concept: Give every character a unique, immutable ID (e.g.,
Start,0.1,0.5,End). To insert between A and B, generate ID0.3.Pros: Commutative (order doesn't matter). Works P2P (Peer-to-Peer).
Cons: IDs can grow large (memory overhead).
2. Handling Network Latency
Optimistic UI: The client applies the change locally immediately so the user feels no lag.
Rollback/Rebase: If the server rejects the change (rare) or transforms it drastically, the client might momentarily "jump" or undo/redo the text to match the server's truth.
3. Deleting Ranges
Deleting a range
(start=2, end=5)is tricky with concurrency.Solution: Treat it as 3 individual
Delete(index)operations or handleRangeDeleteas a specific atomic operation type in the OT Engine.
4. Garbage Collection
Problem: The history of operations grows infinitely.
Solution: When all active clients have acknowledged version 100, the server can "squash" history from 0-100 into a snapshot and delete the individual operation logs.
5. Collaborative Cursor (Presence)
Do not store cursor positions in the
Documentobject.Use a separate Ephemeral State channel (Redis Pub/Sub or WebSocket broadcast).
Cursor positions are transient; if they are lost, it doesn't corrupt the document.
Next Step?
This concludes the document editing design. Would you like to proceed with "Design a Rate Limiter" or "Design a Parking Lot System"?
Payment Recommendation System
An end-to-end Low Level Design (LLD) for a Payment Recommendation System (like the ones used by Razorpay, Stripe, or PhonePe/GPay Smart Routing).
This problem focuses on Rule Engines, Strategy Patterns, and Ranking Algorithms. The goal is not just to list payment methods, but to order them intelligently to maximize Success Rate (SR), User Rewards, or minimize Transaction Costs.
1. Requirements Clarification
Context: A user is on the checkout page of an E-commerce app (e.g., Swiggy/Amazon).
Inputs:
User: Contains stored payment methods (HDFC Credit Card, UPI ID, Paytm Wallet).
Transaction: Amount (e.g., ₹500), Merchant (e.g., Starbucks).
Business Goals (Priorities):
Reliability: Don't show banks that are currently down.
Rewards: Promote cards that give max cashback for this specific merchant.
Cost: Prefer low-cost methods (UPI) over high-cost (Amex) for the merchant (optional B2B toggle).
Output: A ranked list of payment instruments.
2. Core Architecture: The Scoring Pipeline
We use a Pipe & Filter or Chain of Responsibility pattern. Every payment instrument starts with a base score. It passes through a series of "Rule Filters" that modify its score.
The Pipeline:
Candidate Retrieval: Fetch all user instruments.
Eligibility Filter: Remove invalid options (e.g., Wallet balance < transaction amount).
Scoring Engine:
Health Check Rule: Is the bank API up?
Reward Rule: Is there a 5% cashback offer?
Affordability Rule: Is BNPL applicable?
Ranking: Sort by Score DESC.
3. Class Diagram
Code snippet

4. Python Implementation
A. The Models
We define the Payment Instruments and the Context (Cart).
Python
B. The Rules Engine (Strategy Pattern)
This is the core logic. Instead of hardcoding if-else blocks, we create pluggable Rule classes. This allows the business to add/remove rules without changing core code (Open-Closed Principle).
Python
C. The Recommendation Service
This orchestrates the flow.
Python
D. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Handling Latency (The 50ms Constraint)
Problem: If you make an external API call to "Check Bank Health" for every transaction, the checkout page will load slowly.
Solution: Caching & Asynchronous Health Checks.
Do not call the bank in real-time.
Have a background worker ("Heartbeat Service") that checks Bank Health every 10 seconds and updates a Redis Cache.
The
HealthRulesimply reads from Redis ().
2. A/B Testing
Problem: Marketing wants to test if "Boosting UPI" increases conversion rates for 10% of users.
Solution: The
RecommendationEngineshould accept anExperimentConfig.Before adding rules, check
if user_id % 10 == 0: engine.add_rule(BoostUPIRule()).
3. Dynamic Filtering (Constraint Satisfaction)
Scenario: A merchant says "I do not accept AMEX".
Design: Add a Hard Filter Phase before the Scoring Phase.
Filter.filter(instruments)removes invalid items.Scorer.score(instruments)ranks the remaining ones.
4. Cold Start / Guest Users
Question: What if the user has no saved cards?
Answer: Recommend based on global popularity ("Most people on Swiggy use GPay").
Return a generic list:
[Google Pay, PhonePe, Credit/Debit Card].
5. ML-Based Personalization (Advanced)
Instead of static rules (
+10 points), use a Logistic Regression Model.Inputs: Time of day, User History (Alice prefers Credit Cards 90% of the time).
Output: Probability of Success.
The
RecommendationEnginecalls the ML Model to get the final score.
Would you like to explore "Design an Inventory Management System" or "Design a Rate Limiter" next?
Alexa
An end-to-end Low Level Design (LLD) for a Voice Assistant (like Amazon Alexa or Google Home).
This problem is unique because it bridges Hardware (Device) and Software (Cloud). The design must handle the Pipeline of Processing: Audio Text Intent Action Response.
1. Requirements Clarification
Actors: User, Alexa Device, Cloud Service.
Core Flow:
Wake Word: Device listens locally for "Alexa".
ASR (Automatic Speech Recognition): Convert Audio to Text.
NLU (Natural Language Understanding): Extract Intent (What to do) and Slots (Variables).
Execution: Trigger the correct "Skill" (Music, Smart Home, Weather).
TTS (Text to Speech): Convert response back to audio.
Key Design Goal: Extensibility. We must be able to add new capabilities (e.g., "Order Pizza") without rewriting the core engine.
2. Core Architecture: The Pipeline & Command Pattern
We need a Pipeline Pattern to process the data in stages, and a Command/Strategy Pattern to handle the diverse capabilities (Skills).
The "Device" (Client): Dumb terminal. It only processes the "Wake Word" locally (to save bandwidth/privacy) and streams audio to the cloud.
The "Orchestrator" (Cloud): The brain. It routes the text to the correct Skill.
The "Skill" (Plugin): A modular unit of logic (e.g., SpotifySkill, HueLightSkill).
3. Class Diagram
Code snippet

4. Python Implementation
A. The Data Models (Intent & Slots)
We need a structured way to pass user desires around.
Utterance: "Play Taylor Swift"
Intent:
PlayMusicSlots:
{artist: "Taylor Swift"}
Python
B. The Skill Interface (Strategy Pattern)
This is the most critical part for extensibility. Every capability is a Skill.
Python
C. The Orchestrator (The Cloud Brain)
This component wires everything together. It holds a registry of skills and routes the intent.
Python
D. The Device (Client)
The device handles the "Wake Word" loop.
Python
E. Driver Code
Python
5. Key Design Challenges & Interview Answers
1. Handling State & Context
Scenario:
User: "Who is the President of the US?" Alexa: "Joe Biden."
User: "How old is he?"
Problem: The second query has no context. "He" is ambiguous.
Design: The
AlexaCloudServicemust maintain aSessionContextobject.Store the last
IntentandEntity.Pass
SessionContextinto theNLUServiceto resolve references (Coreference Resolution).
2. Latency vs. Throughput (Streaming)
Question: Do we wait for the user to finish the whole sentence before sending to Cloud?
Answer: No. That feels slow.
Design: Use Bi-directional Streaming (gRPC or WebSockets).
Device streams chunks of audio bytes.
Cloud ASR processes chunks in real-time (Intermediate Transcription).
This allows Alexa to determine "End of Sentence" (Silence Detection) dynamically.
3. Third-Party Skills (Isolation)
Problem: If the "Uber Skill" crashes, it shouldn't crash the whole Alexa server.
Design: Run Skills as Serverless Functions (AWS Lambda).
The
AlexaCloudServicedoesn't execute the skill code directly. It calls an external API/Lambda.This ensures sandboxing and security.
4. Privacy (Local vs. Cloud)
Design:
Wake Word Engine: Must run On-Device (using a tiny ML model). No audio leaves the house until this trigger fires.
Cloud: Only receives audio after the wake word.
5. Multi-Turn Conversations
Scenario: "Set an alarm." "For what time?" "7 AM."
Design:
The
Intentresult can be aDialogDelegate.If a required slot (Time) is missing, the NLU returns
SlotElicitationstate, prompting the orchestrator to ask a clarifying question instead of executing the action.
Last updated