githubEdit

Instagram

Problem Statement

Design a photo and video sharing social media platform like Instagram that allows users to upload media, follow other users, view feeds, like/comment on posts, and discover content through explore and hashtags.


Requirements

Functional Requirements

  1. Upload photos/videos (max 10 photos per post, 60s video)

  2. Follow/Unfollow users

  3. News Feed showing posts from followed users (chronological + algorithmic)

  4. Like, Comment, Share posts

  5. Stories (24-hour temporary posts)

  6. Direct Messaging (text, photos, videos)

  7. Search users and hashtags

  8. Explore page with personalized recommendations

Non-Functional Requirements

  1. High availability: 99.95% uptime

  2. Low latency: < 200ms for feed load

  3. Scalability: 1 billion users, 100M DAU

  4. Eventual consistency acceptable for likes/comments

  5. Global distribution: Multi-region deployment


Capacity Estimation

Traffic Estimates

  • Daily Active Users (DAU): 100 million

  • Posts uploaded/day: 50 million (0.5 posts per DAU)

  • Photos per post: Average 3 photos

  • Total photos/day: 150 million

  • Feed requests/user/day: 10

  • Total feed requests/day: 1 billion

Storage Estimates

  • Average photo size: 2 MB (compressed)

  • Average video size: 20 MB (60s at 3 Mbps)

  • Daily storage:

    • Photos: 150M × 2 MB = 300 TB/day

    • Videos (10% of posts): 5M × 20 MB = 100 TB/day

    • Total: 400 TB/day

  • 5-year storage: 400 TB × 365 × 5 = 730 PB

Bandwidth Estimates

  • Upload: 400 TB / 86400s = 4.6 GB/sec

  • Feed views: 1B requests/day × 3 photos × 2 MB = 6 PB/day = 70 GB/sec

  • With CDN caching (80% cache hit): 14 GB/sec from origin


API Design

1. Upload Post

2. Get Feed

3. Like/Unlike Post

4. Post Comment


High-Level Architecture

spinner

Detailed Component Design

1. Image Upload & Processing Pipeline

spinner

Image Variants:

  • Thumbnail: 150×150 (profile grid)

  • Feed: 640×640 (mobile feed)

  • Full: 1080×1080 (detail view)

  • Story: 1080×1920 (9:16 aspect ratio)

Compression:

  • JPEG: 85% quality (balance size/quality)

  • WebP: For modern browsers (30% smaller)

SDE-3 Deep Dive: Primary ID Generation (Snowflake)

  • Problem: Instagram generates millions of posts. Auto-incrementing DB IDs don't scale globally across shards. UUIDs are 128-bit (too large) and random (breaks index locality in DB B-Trees).

  • Solution (Twitter Snowflake/Instagram Sharding ID): A 64-bit integer ID that is time-sortable.

    • 41 bits: Timestamp (milliseconds) - guarantees time ordering

    • 13 bits: Logical Shard ID - determines DB shard for the post

    • 10 bits: Sequence Number - prevents collision if multiple posts hit the same shard in the exact same millisecond.

2. Feed Generation Strategy

Two Approaches:

A. Fanout-on-Write (Push Model)

spinner

Pros:

  • Fast read (pre-computed)

  • Low read latency

Cons:

  • Slow write for celebrity users (millions of followers)

  • Hotkey problem in Redis

B. Fanout-on-Read (Pull Model)

spinner

Pros:

  • Handles celebrity users

  • No fanout overhead

Cons:

  • Slow read (compute on-the-fly)

  • High CPU usage

Hybrid Approach (Instagram's Strategy)

3. Database Schema

Users Table (PostgreSQL)

Posts Table (Sharded by user_id)

Social Graph (Neo4j or Dedicated Service)

Alternative: PostgreSQL with Adjacency List

Likes (Cassandra - Write-Heavy)

4. Timeline Cache (Redis)

Data Structure:

Memory Optimization:

  • Store only post IDs in Redis

  • Fetch full post details from DB in batch

  • Cache hot post metadata (JSON) separately

5. Direct Messaging

spinner

Message Schema (Cassandra):

WebSocket Connection:

  • Sticky sessions: User always connects to same gateway

  • Heartbeat: Ping every 30s to detect disconnects

  • Fallback: HTTP long-polling for poor connections


Scalability Strategies

1. Database Sharding

Posts Sharding:

Benefit: User's posts co-located on same shard → efficient queries

2. CDN Strategy

  • Edge locations: 200+ PoPs globally

  • Cache control: Cache-Control: max-age=86400, immutable

  • Image optimization: Serve WebP to modern browsers, JPEG fallback

3. Hot Post Handling

Problem: Viral post overwhelms like_counts table

Solution: Write-Behind Cache


Advanced Features

1. Ranking Algorithm (Explore Page)

Features:

  • User interests (from past interactions)

  • Post engagement rate (likes/followers ratio)

  • Recency (decay function)

  • Creator authority (follower count)

Model:

2. Stories (24-hour Expiry)

Storage:

  • S3 with lifecycle policy (delete after 24h)

  • Redis for active stories list

Elasticsearch Index:


Trade-offs

Aspect
Choice
Trade-off

Feed Generation

Hybrid (push + pull)

Complexity vs performance

Graph Storage

PostgreSQL adjacency list

Simplicity vs query efficiency

Likes Storage

Cassandra counters

Eventual consistency vs throughput

Image Storage

S3 + CDN

Cost vs latency


Interview Discussion Points

Q: How to handle celebrity users with 100M followers?

  • Pull-based feed: Don't fanout to all followers

  • Separate queue: Prioritize celebrity posts

  • Eventual consistency: Followers see post within minutes (acceptable)

Q: Preventing duplicate photo uploads?

  • Perceptual hashing (pHash): Generate hash of image content

  • Compare with existing hashes in bloom filter

  • Trade-off: Some duplicates missed vs 100% accuracy

Q: Optimizing feed load time?

  • Prefetch: Load next page while user scrolls

  • Async rendering: Render text first, lazy load images

  • Pagination: Cursor-based (not offset) to avoid deep pagination

Last updated