System Design for Software Engineers

Database Replication: Availability and Read Scaling

As your traffic grows, a single database server becomes a bottleneck. Even if you have enough CPU and RAM, a single disk can only perform so many I/O operations per second. Replication is the practice of keeping a copy of the same data on multiple machines.

Replication solves two major problems:

High Availability: If the primary server fails, a replica can take over.
Read Scalability: You can distribute read-only queries across multiple "read replicas."

Replication Architectures

Leader-Follower (Master-Slave)

This is the most common architecture. All writes go to a single Leader. The leader then sends these updates to one or more Followers.

Usage: Clients send writes to the Leader and reads to any Follower.
Pros: Simple to reason about, no write conflicts.
Cons: If the leader fails, you must promote a follower (Failover), which can be complex.

Multi-Leader

Multiple nodes can accept writes. They then replicate their changes to each other.

Usage: Useful for multi-region deployments where you want users to write to their nearest data center.
Pros: Can survive a whole data center outage.
Cons: High complexity; you must handle Write Conflicts (when two users update the same record in different regions).

Leaderless (Dynamo-style)

Every node can accept both reads and writes. Examples include Cassandra and Riak.

Usage: Often used for high-write-throughput systems.
Pros: High availability and fault tolerance.
Cons: Complex consistency models (Quorums).

Setting Up a Read-Heavy Architecture

1
Step 1
Check your application logs. If your reads outnumber writes (e.g., 100:1), you are a perfect candidate for read replicas.
2
Step 2
Create a new database instance with the same hardware specifications as your leader. Most managed services (like AWS RDS or Google Cloud SQL) allow you to do this with one click.
3
Step 3
Configure the leader to stream its transaction logs (e.g., binlog in MySQL) to the follower. Most production systems use Asynchronous replication to avoid slowing down writes on the leader.
4
Step 4
Modify your application code to use two different connection strings: one for the Leader (writes) and a load-balanced set for the Followers (reads).
5
Step 5
Be aware that followers might be a few milliseconds (or seconds) behind the leader. If a user updates their profile and immediately refreshes, they might see the old data if the read goes to a lagging follower.

Synchronous vs. Asynchronous Replication

Synchronous: The leader waits for the follower to confirm it received the write before telling the client "Success."
- Pro: Guaranteed consistency (no data loss if leader fails).
- Con: High latency; if the follower is slow or the network flickers, the whole system stops.
Asynchronous: The leader writes locally and immediately tells the client "Success," then sends the data to the follower in the background.
- Pro: Very fast; system stays available even if followers are down.
- Con: Potential data loss; if the leader crashes before the follower receives the update, that data is gone.

Common Mistakes

Reading Your Own Writes: A user posts a comment (write to leader) and is redirected to the page (read from follower). If the follower hasn't caught up, the comment is missing. Solution: Route the first read after a write to the leader.
Follower Overload: Sending too many complex analytical queries to a single follower, causing it to lag further and further behind.
Cascading Failures: If the leader fails, and the newly promoted follower isn't powerful enough to handle the traffic, it will also crash, taking the whole system down.

Recap

Replication increases availability and allows you to scale reads.
Leader-Follower is the standard; Multi-Leader is for multi-region; Leaderless is for high-write scale.
Asynchronous replication is faster but risks data loss during failover.
Always account for Replication Lag in your application logic.

Knowledge Check

Question 1 of 3

Q1Single choice

What is the primary benefit of adding 'Read Replicas' to a database cluster?

It makes writes faster

It increases the read throughput of the system by offloading queries from the leader

It automatically fixes corrupted data on the leader

It eliminates the need for a load balancer

Relational vs. NoSQL: Choosing the Right Tool

Database Sharding and Partitioning