Case Study: Designing a News Feed
Case Study: Designing a News Feed
Designing a news feed like Facebook or Twitter is a challenge of Read Aggregation. You have millions of users, each following hundreds of other users. When a user opens their app, you must quickly find all the recent posts from everyone they follow, sort them by time (or an algorithm), and display them.
The Requirements
- Functional: Create posts. Follow/Unfollow users. Generate a news feed of posts from followed users.
- Non-Functional: Fast feed generation (< 200ms). High availability. Support for "Celebrity" accounts with millions of followers.
Feed Generation: Pull vs. Push
1. The Pull Model (Fan-out on Load)
When a user requests their feed, the system queries all the people they follow, fetches their latest posts, and merges them.
- Pros: Easy to implement; works well for users who follow many people.
- Cons: Extremely slow for the end-user (heavy database reads every time the app opens).
2. The Push Model (Fan-out on Write)
When a user creates a post, the system immediately "pushes" that post into the pre-computed feed (cache) of all their followers.
- Pros: Blazing fast reads. The feed is already waiting for the user in a Redis cache.
- Cons: Extremely slow writes for "Celebrities." If a user with 50 million followers posts, the system must write to 50 million different caches.
A Hybrid Feed Generation Strategy
- 1Step 1
User A (a normal user with 200 followers) creates a post. The App Server saves it to the main database.
- 2Step 2
A background worker (Fan-out service) fetches User A's 200 followers and pushes the post ID into each follower's 'Feed Cache' in Redis.
- 3Step 3
User C (a celebrity with 10 million followers) creates a post. The Fan-out service detects they are a 'Celebrity' and stops the push process to avoid a 10-million-write spike.
- 4Step 4
When one of User C's followers (User B) opens their app, the system fetches their pre-computed feed from Redis AND separately fetches recent posts from all 'Celebrity' accounts User B follows.
- 5Step 5
The system merges the cached feed with the live celebrity posts, sorts them by time, and serves the result to User B. This hybrid approach keeps the system fast and stable.
Storage Strategy
- Users and Follows: A Relational DB (PostgreSQL) or a Graph DB (Neo4j) is good for managing the "Following" relationships.
- Posts: A NoSQL Document store (like MongoDB) or a Column store (Cassandra) handles the high volume of content.
- Feed Caches: Redis is essential. Each user has a "Feed List" in Redis containing the IDs of the last 500-1,000 posts they should see.
The "Celebrity Problem" (Hotspots)
In system design, a "Hotspot" is a piece of data that gets massive amounts of traffic compared to others. A celebrity post is a classic hotspot. By using the Hybrid Model (Push for normal users, Pull for celebrities), you balance the load and ensure no single event crashes the system.
Common Mistakes
- Pulling Everything: Trying to build a production news feed with a single SQL query:
SELECT * FROM posts WHERE user_id IN (followed_ids). This will work for 10 users but fail miserably for 10,000. - Infinite Feed Caches: Storing every post ever made in a user's Redis feed. Only store the last few hundred. If the user scrolls further, then you can go back to the main database.
- Ignoring Media: Forgetting that a "News Feed" is mostly images and videos. The "Post" record should only contain metadata and the CDN URL of the media.
Recap
- Fan-out on Write (Push) makes reads fast but writes slow.
- Fan-out on Load (Pull) makes writes fast but reads slow.
- Use a Hybrid Model to handle both normal users and celebrities.
- Redis is the backbone of the "pre-computed" feed.
Knowledge Check
What is the main drawback of the 'Push Model' (Fan-out on Write) for news feeds?