Database Replication: Availability and Read Scaling
Database Replication: Availability and Read Scaling
As your traffic grows, a single database server becomes a bottleneck. Even if you have enough CPU and RAM, a single disk can only perform so many I/O operations per second. Replication is the practice of keeping a copy of the same data on multiple machines.
Replication solves two major problems:
- High Availability: If the primary server fails, a replica can take over.
- Read Scalability: You can distribute read-only queries across multiple "read replicas."
Replication Architectures
Leader-Follower (Master-Slave)
This is the most common architecture. All writes go to a single Leader. The leader then sends these updates to one or more Followers.
- Usage: Clients send writes to the Leader and reads to any Follower.
- Pros: Simple to reason about, no write conflicts.
- Cons: If the leader fails, you must promote a follower (Failover), which can be complex.
Multi-Leader
Multiple nodes can accept writes. They then replicate their changes to each other.
- Usage: Useful for multi-region deployments where you want users to write to their nearest data center.
- Pros: Can survive a whole data center outage.
- Cons: High complexity; you must handle Write Conflicts (when two users update the same record in different regions).
Leaderless (Dynamo-style)
Every node can accept both reads and writes. Examples include Cassandra and Riak.
- Usage: Often used for high-write-throughput systems.
- Pros: High availability and fault tolerance.
- Cons: Complex consistency models (Quorums).
Setting Up a Read-Heavy Architecture
- 1Step 1
Check your application logs. If your reads outnumber writes (e.g., 100:1), you are a perfect candidate for read replicas.
- 2Step 2
Create a new database instance with the same hardware specifications as your leader. Most managed services (like AWS RDS or Google Cloud SQL) allow you to do this with one click.
- 3Step 3
Configure the leader to stream its transaction logs (e.g., binlog in MySQL) to the follower. Most production systems use Asynchronous replication to avoid slowing down writes on the leader.
- 4Step 4
Modify your application code to use two different connection strings: one for the Leader (writes) and a load-balanced set for the Followers (reads).
- 5Step 5
Be aware that followers might be a few milliseconds (or seconds) behind the leader. If a user updates their profile and immediately refreshes, they might see the old data if the read goes to a lagging follower.
Synchronous vs. Asynchronous Replication
- Synchronous: The leader waits for the follower to confirm it received the write before telling the client "Success."
- Pro: Guaranteed consistency (no data loss if leader fails).
- Con: High latency; if the follower is slow or the network flickers, the whole system stops.
- Asynchronous: The leader writes locally and immediately tells the client "Success," then sends the data to the follower in the background.
- Pro: Very fast; system stays available even if followers are down.
- Con: Potential data loss; if the leader crashes before the follower receives the update, that data is gone.
Common Mistakes
- Reading Your Own Writes: A user posts a comment (write to leader) and is redirected to the page (read from follower). If the follower hasn't caught up, the comment is missing. Solution: Route the first read after a write to the leader.
- Follower Overload: Sending too many complex analytical queries to a single follower, causing it to lag further and further behind.
- Cascading Failures: If the leader fails, and the newly promoted follower isn't powerful enough to handle the traffic, it will also crash, taking the whole system down.
Recap
- Replication increases availability and allows you to scale reads.
- Leader-Follower is the standard; Multi-Leader is for multi-region; Leaderless is for high-write scale.
- Asynchronous replication is faster but risks data loss during failover.
- Always account for Replication Lag in your application logic.
Knowledge Check
What is the primary benefit of adding 'Read Replicas' to a database cluster?