Difficulty: Medium Topics: Database Schema, Hierarchical Data, Materialized Path, Recursion Key Concepts: Storing trees in SQL, Read vs Write optimization.
Phase 1: Requirements Gathering
Goals
Design a scalable comment system supporting infinite nesting.
Efficiently store and retrieve deep comment trees.
Handle high read volume (viral posts).
1. Who are the actors?
User: Posts comments, replies, upvotes/downvotes.
System: Renders comment threads, manages scores.
2. What are the must-have features? (Core)
Posting: Add a root comment or a reply.
Retrieval: Fetch comments for a post in hierarchical order (Threaded view).
Ranking: Order by score (Upvotes - Downvotes) or time.
Pagination: Load top-level comments and expand replies on demand.
3. What are the constraints?
Scale: Millions of comments per popular post.
Latency: Read latency must be low (< 200ms).
Depth: Support deep nesting (e.g., 50+ levels).
Phase 2: Use Cases
UC1: Post Comment/Reply
Actor: User Flow:
User submits content for a PostId (and optionally ParentId if reply).
System calculates Path and Depth based on Parent.
System saves comment to DB.
System updates denormalized counts (e.g., total comments on post).
UC2: View Comments
Actor: User Flow:
User requests comments for PostId.
System fetches comments ordered by Path (Materialized Path sorting gives DFS order).
Frontend renders tree structure using Depth.
Phase 3: Class Diagram
Step 1: Core Entities
Comment: The main unit of content.
User: Author.
Post: Context container.
Vote: User interaction.
UML Diagram
Database Schema
Table: Comments
Column
Type
Description
id
BIGINT
PK
post_id
BIGINT
The Post ID
user_id
BIGINT
The Author
parent_id
BIGINT
Direct Parent (NULL for Root)
path
VARCHAR
Lineage: 0001/0050/0101
depth
INT
Indentation Level
content
TEXT
Body
upvote_count
INT
Denormalized count
Index: (post_id, path) for fast retrieval.
Phase 4: Design Patterns
1. Materialized Path Pattern
Description: A hierarchical data model where each node stores its full ancestry path (e.g., "1/5/9") as a string.
Why used: Allows fetching an entire comment subtree or sorting by conversation thread depth with a simple ORDER BY path query, avoiding expensive recursive joins or CTEs in the database.
Description: Separates read and update operations for a data store.
Why used: Comment systems often have high read-to-write ratios (Viral posts). CQRS allows optimizing the Read model (denormalized, flat structure) differently from the Write model (normalized, relational) for performance.
Phase 5: Code Key Methods
Java Implementation
Phase 6: Discussion
Tree Storage Strategies (SDE-3 Concept)
Q: Why Materialized Path? What are the alternatives for Storing Trees in RDBMS?
Adjacency List (parent_id): Good for inserts, but fetching a whole tree requires Recursive CTEs (WITH RECURSIVE). Slow for deep trees ($O(N)$ depth queries).
Materialized Path: Storing string path allows simple ORDER BY path to get DFS tree traversal order. Fast reads. Downside: Moving a subtree requires rewriting paths for all descendants.
Closure Table (SDE-3 Alternative): Store a separate table CommentPaths (ancestor_id, descendant_id, depth). Every node maps to all its descendants.
Pros: Fully normalized, blazing fast to find all descendants of any comment at any depth. Moving subtrees is easier than Materialized Path.
Cons: Storage overhead ($O(N^2)$ rows in the worst case of a straight line discussion). Best trade-off for Reddit-style systems where reads vastly outnumber writes.
Scaling
Q: How to handle viral threads (100k+ comments)?
Pagination: Don't load everything. Fetch top-level comments first (depth=0). Load replies asynchronously when user clicks "Load More".
Caching: Cache the first page of comments for popular posts in Redis.
Denormalization
Q: Where to store vote counts?
A: Store upvote_count in the Comments table. Do not COUNT(*) from Votes table on every read. Update this counter asynchronously or in batch when votes occur.
SOLID Principles Checklist
S (Single Responsibility): Service handles logic, Repository handles DB.
O (Open/Closed): New retrieval strategies (e.g., Sort by Top) can be added.
D (Dependency Inversion): Service depends on Repository interface.