Case Study: Designing a URL Shortener (TinyURL)
Case Study: Designing a URL Shortener (TinyURL)
In this first case study, we will design a service like TinyURL or Bitly. This is a classic system design interview question that covers many of the fundamentals we've learned: scaling, hashing, and database selection.
The Requirements
- Functional: Given a long URL, return a short, unique alias. When a user clicks the short URL, redirect them to the original long URL.
- Non-Functional: High availability (the links must always work), low latency (redirection should be instant), and scalability (handle billions of URLs).
Hashing and Uniqueness
The core of the system is converting a long URL into a short string (e.g., https://tiny.url/abc123).
- Base 62 Encoding: We use characters
[a-z, A-Z, 0-9]which gives us 62 possible characters for each position. - Length: A 7-character string gives us trillion unique combinations, more than enough for our needs.
- Key Generation Service (KGS): Instead of hashing the URL at runtime (which can lead to collisions), we can have a separate service that pre-generates billions of unique 7-character strings and stores them in a database. When a request comes in, we just "grab" one.
The URL Shortening Workflow
- 1Step 1
A user submits
https://very-long-domain.com/some/deep/path/page.htmlto our API. - 2Step 2
The App Server asks the Key Generation Service for the next available 7-character key (e.g.,
8xK9p2z). - 3Step 3
The App Server saves the mapping
{ '8xK9p2z': 'https://very-long-domain.com/...' }into our primary database and also into a Redis Cache for fast access. - 4Step 4
The server returns
https://tiny.url/8xK9p2zto the user. Total time: < 50ms. - 5Step 5
A different user clicks the short link. The Load Balancer sends the request to an App Server, which first checks Redis. Finding the hit, it returns a
302 Redirectto the original long URL.
Database Selection
- Type: A NoSQL Key-Value store or a simple Relational database (like PostgreSQL) both work well. Since we don't have complex relationships, a NoSQL store like DynamoDB or Cassandra is often easier to scale.
- Structure:
short_url(Primary Key, String)original_url(String)created_at(Timestamp)user_id(Optional, for analytics)
Scaling and Optimization
- Caching: 20% of the URLs will likely get 80% of the traffic. Use a Redis cache to store the most popular mappings, drastically reducing DB load.
- DB Sharding: Shard the database by the
short_urlhash to handle billions of rows across multiple servers. - Clean-up Service: Use a background worker to delete URLs that haven't been accessed in years to save space.
Common Mistakes
- Using MD5/SHA Hashing: MD5 produces a long string. If you truncate it to 7 characters, you will get collisions (two different long URLs having the same short URL). Using a KGS is safer.
- Blocking KGS: If the KGS is a single point of failure, your whole system stops. Ensure the KGS is highly available and pre-allocates keys in memory for speed.
- Missing Redirection Code: Forgetting that for SEO and analytics, you should use
301 (Permanent)or302 (Found)redirects appropriately.
Recap
- Use Base 62 for short, readable, unique IDs.
- A Key Generation Service prevents collisions and simplifies the logic.
- Redis Caching is vital for the 80/20 traffic distribution.
- Choose a database that scales horizontally (NoSQL).
Knowledge Check
Why is a Key Generation Service (KGS) preferred over hashing the long URL directly?