githubEdit

URL Shortener

Difficulty: Easy Topics: Hashing, Base62 Encoding, Database Sharding Time: 45-60 minutes Companies: Google, Amazon, Meta, Microsoft

Problem Statement

Design a URL shortening service like bit.ly or TinyURL that:

  • Converts long URLs to short URLs

  • Redirects short URLs to original long URLs

  • Tracks click analytics (optional)

Example:

Long URL: https://www.example.com/very/long/url/with/many/parameters?id=12345
Short URL: https://short.ly/abc1234

Requirements Gathering

Functional Requirements

Must-Have:

  1. Generate short URL from long URL

  2. Redirect short URL to long URL (301/302)

  3. Short URLs never expire

Nice-to-Have: 4. Custom aliases (e.g., short.ly/mycompany) 5. Analytics (clicks, geographic, referrers) 6. Expiration time (TTL) 7. Rate limiting

Non-Functional Requirements

Performance:

  • Latency: P99 < 100ms for redirects

  • Throughput: 10K writes/sec, 100K reads/sec (10:1 read:write ratio common)

Availability:

  • 99.99% uptime (53 minutes downtime/year)

  • No single point of failure

Scalability:

  • 100M new URLs/month

  • 10B redirects/month

Security:

  • Prevent spam/malicious URLs

  • Rate limiting per user/IP

Reliability:

  • URLs never lost (durable storage)

  • Eventual consistency acceptable


Capacity Estimation

Traffic Estimates

Storage Estimates

Bandwidth Estimates

Cache Requirements (80/20 Rule)


API Design

REST Endpoints

1. Create Short URL

2. Redirect Short URL

Note for Interviews: Discuss 301 vs 302.

  • 301 (Moved Permanently): Browser caches the redirect. Reduces load on servers but prevents accurate analytics tracking.

  • 302 (Found/Moved Temporarily): Browser contacts our server for every click. Higher load but 100% accurate analytics. Use 302 if analytics is a functional requirement.

3. Get URL Info (Optional)

4. Delete URL


Database Schema

SQL Schema (PostgreSQL)

NoSQL Schema (DynamoDB)


High-Level Design

Architecture Diagram


Data Flow

Write Flow (Create Short URL)

Read Flow (Redirect Short URL)


Deep Dive Topics

1. Short Code Generation

Requirements:

  • Unique (no collisions)

  • Short (7 characters = 62^7 = 3.5 trillion combinations)

  • URL-safe characters: [a-zA-Z0-9] (Base62)

Option A: Hash-based (MD5/SHA + Base62)

Pros: Deterministic (same URL → same short code) Cons: Collision possible (need retry logic)


Option B: Auto-Incrementing Counter + Base62 (Recommended)

Pros: Guaranteed unique, no collisions Cons: Sequential (somewhat predictable)


Option C: Snowflake ID (Distributed ID Generator)

Pros: Distributed, no coordination, time-ordered Cons: Requires separate ID service


2. Scaling the Database

Sharding Strategy:

Pros: Even distribution, horizontal scaling Cons: Cross-shard queries difficult (e.g., "get all user's URLs")

Solution for user queries: Use GSI in DynamoDB or replicate to separate analytics DB


3. Caching Strategy and Optimizations

Cache-Aside (Lazy Loading):

Cache Eviction: LRU (Least Recently Used) Cache Size: 32 GB (covers 6M hot URLs)

SDE-3 Optimization: Bloom Filters

  • Problem: Malicious users might request millions of random/invalid short URLs, causing cache misses and hitting the DB.

  • Solution: Place a Bloom Filter in front of the cache. Space-efficient probabilistic data structure. False positives are possible (will hit DB and return 404), false negatives are impossible.


4. Rate Limiting

Token Bucket Algorithm (per user/IP):

Limits:

  • 10 URLs/minute per user (free tier)

  • 1000 URLs/minute per user (paid tier)


Failure Scenarios & Mitigation

Scenario 1: Database Primary Failure

Impact: Cannot create new URLs, writes fail Mitigation:

  • Promote read replica to primary (automated failover)

  • Reads continue from replicas (cached redirects unaffected)

  • RTO: < 5 minutes, RPO: < 1 minute of data loss

Scenario 2: Cache Failure (Redis Down)

Impact: All requests hit database (performance degradation) Mitigation:

  • Database read replicas handle load (12K QPS tolerable)

  • Degrade gracefully (slightly higher latency)

  • Auto-restart Redis, persistent storage (RDB/AOF)

Scenario 3: ID Generation Service Down

Impact: Cannot generate new short codes Mitigation:

  • Pre-generate IDs and store in buffer (1M pre-generated IDs)

  • Fallback to hash-based generation

  • Multi-region ID service (active-active)


Monitoring & Alerts

Key Metrics:

Alerts:


Cost Estimation (AWS)


Interview Tips

Common Questions:

  1. "How do you generate short codes?" → Base62 encoding of counter/Snowflake ID

  2. "What if the database goes down?" → Read replicas, automated failover

  3. "How do you prevent spam?" → Rate limiting, CAPTCHA, blacklist malicious domains

  4. "How do you scale to billions of URLs?" → Sharding, caching, read replicas

Good Answer Template:

"I'd use a counter-based approach with Base62 encoding to generate 7-character short codes (3.5 trillion combinations). Store URLs in PostgreSQL, shard by hash(short_code) for horizontal scaling. Use Redis cache for 95% hit rate on redirects. Implement token bucket rate limiting per user. Monitor success rate (99.99% SLA) and P99 latency (<100ms)."

Time Allocation:

  • Requirements: 5 min

  • Estimation: 5 min

  • API + Schema: 5 min

  • Architecture: 10 min

  • Deep dives (short code gen, sharding, caching): 20 min

  • Failure scenarios: 5 min

Last updated