#26 URL shortner
Here’s a complete, time-boxed, 1-hour interview-ready answer for designing a URL Shortening Service (like TinyURL, Bitly). It follows your system design interview structure, including functional & non-functional requirements, APIs/data model, architecture, deep dive, and trade-offs.
0 – 5 min — Problem recap, scope & assumptions
Goal: Design a service that converts long URLs into short, unique URLs and allows users to access the original URLs using the short links.
Scope for interview:
Shorten long URLs to compact URLs.
Redirect short URLs to original URLs.
Handle high read traffic for redirects.
Optional: analytics (click counts, geolocation, referrers).
Assumptions:
Millions of URLs, billions of redirects.
Short URLs should be unique and collision-free.
Redirect latency <50 ms.
Service supports web and API access.
Optional: expiration for short URLs.
5 – 15 min — Functional & Non-Functional Requirements
Functional Requirements
Must
URL shortening: Generate a unique short URL for a given long URL.
Redirect: Accessing the short URL redirects to the original URL.
Idempotency: Same long URL may generate same or different short URLs depending on design.
Analytics (optional): Track clicks, referrers, geolocation.
Should
Custom short URLs provided by users.
Expiration and deletion of URLs.
Rate limiting per user/API key.
Nice-to-have
QR code generation.
Support for vanity URLs.
A/B testing for short URLs (marketing).
Non-Functional Requirements
Latency: Redirect should be very fast (<50 ms).
Availability: 99.99% uptime; high availability is critical.
Scalability: Support billions of URLs and redirects.
Durability: Persistent storage of URL mappings.
Consistency: Strong consistency for mapping short → long URL.
Monitoring: Track traffic, errors, latency, and system health.
15 – 25 min — API / Data Model
APIs
Data Models
URLMapping
User (optional)
25 – 40 min — High-level architecture & data flow
Components
URL Shortening Service: Generates unique short codes and stores mapping.
Database / Key-Value Store: Stores short URL → long URL mappings (Cassandra, DynamoDB, or MySQL).
Cache Layer: Hot short URLs in Redis for fast redirects.
Analytics Service: Counts clicks, stores metrics asynchronously.
API Gateway: Handles user requests, rate-limiting, and routing.
Data Flow
User submits long URL → Shortening Service generates short URL → stores in DB → returns short URL.
User accesses short URL → Service checks cache → redirect to long URL. If cache miss → fetch from DB → redirect.
Analytics logged asynchronously via message queue.
40 – 50 min — Deep dive — short URL generation & scaling
Short URL Generation
Base62 encoding: Convert auto-increment ID to short alphanumeric string.
Hash-based: Hash long URL → use first N characters. Handle collisions via additional bits.
Custom alias: Use provided alias, check uniqueness.
Scaling
Sharding: Split URLs by hash or prefix to multiple DB nodes.
Caching: Hot URLs in Redis for low-latency redirects.
Load Balancer: Distribute API traffic across multiple instances.
CDN (optional): Cache popular short URLs globally for faster access.
Fault Tolerance
Replicate DB across regions for durability.
Use retries for analytics events.
Cache replication for high availability.
50 – 55 min — Back-of-the-envelope calculations
Assumptions
1B URLs, each short URL mapping ~200 bytes → ~200 GB storage.
Peak redirects: 10M/sec → caching required.
Writes (shorten URL requests): 100K/sec → NoSQL DB with high write throughput.
Cache size: store top 100M hot URLs → ~20 GB Redis cluster.
55 – 58 min — Monitoring & ops
Monitoring
Redirect latency, cache hit/miss.
URL creation rate.
Analytics processing latency.
Error rates and system health.
Operational concerns
Handle high read-to-write ratio efficiently.
Prevent collisions in short URL generation.
Clean-up expired URLs.
58 – 60 min — Trade-offs, evolution & summary
Trade-offs
Base62 vs Hash: Base62 simple, sequential; hash reduces predictability but requires collision handling.
Cache vs DB: Cache reduces latency for redirects; DB ensures durability.
SQL vs NoSQL: NoSQL preferred for high write throughput; SQL may be simpler for small scale.
Evolution
MVP: Generate short URLs, redirect, basic DB storage.
Phase 2: Custom aliases, cache popular URLs, analytics service.
Phase 3: Global scaling, CDN caching, expiration, rate-limiting, vanity URLs.
Summary
System generates unique short URLs and performs fast redirects.
Uses DB + cache for scalability and low latency.
Event-driven analytics for clicks.
Horizontally scalable, fault-tolerant, and monitorable system.
If you want, I can next create a sequence diagram showing URL creation, caching, redirect, and analytics logging, which is very useful to explain in interviews.
Do you want me to create that diagram next?
Last updated