The CAP Theorem Revisited
The CAP Theorem Revisited
In the world of distributed databases, you cannot have everything. The CAP Theorem, first proposed by Eric Brewer, states that a distributed system can only provide two out of the following three guarantees at any given time:
- Consistency (C): Every read receives the most recent write or an error.
- Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
The "P" is Mandatory
In a distributed system, network partitions (P) are inevitable. Therefore, the choice isn't between C, A, and P; it's between Consistency (CP) or Availability (AP) when a partition occurs.
- CP (Consistency + Partition Tolerance): If a partition occurs, the system will refuse to accept writes or return errors for reads until the partition is resolved, ensuring data remains consistent. Examples: MongoDB (in default config), HBase, CockroachDB.
- AP (Availability + Partition Tolerance): If a partition occurs, the nodes will continue to accept reads and writes, even if they can't talk to each other. This leads to "Eventual Consistency." Examples: Cassandra, DynamoDB, CouchDB.
PACELC: The Modern Extension
The CAP theorem only describes what happens when there is a network partition. The PACELC theorem extends this by describing what happens during normal operation (when there is no partition):
- Partitioned: choose between Availability or Consistency.
- Else (no partition): choose between Latency or Consistency.
This explains why even when the network is fine, some databases (like Cassandra) still choose lower consistency to provide lower latency.
Selecting a Consistency Model
- 1Step 1
Is data correctness more important than the system being online? For a bank transfer, choose Strong Consistency (CP). For a social media 'like' count, choose Availability (AP).
- 2Step 2
Does the user expect a response in under 50ms? If so, you may need to choose Eventual Consistency (EL) to avoid the overhead of cross-node coordination, even when the network is healthy.
- 3Step 3
Many NoSQL databases allow you to tune consistency per request:
- ONE: Success if 1 node responds (Fastest, least consistent).
- QUORUM: Success if a majority (n/2 + 1) of nodes respond (Balanced).
- ALL: Success only if all replicas respond (Slowest, most consistent).
- 4Step 4
If you choose an AP system, ensure that the user who just made a change always sees their own update. This can be done by routing their next read to the same node they wrote to, or using a session-based consistency token.
Types of Consistency
- Strong Consistency: After a write, any subsequent read will return that value.
- Eventual Consistency: If no new updates are made, eventually all reads will return the same value.
- Monotonic Reads: If a user has seen a certain value, they will never see an older value in the future.
- Read-Your-Writes: A user will always see the updates they submitted themselves, even if other users haven't seen them yet.
Common Mistakes
- Assuming 'Consistent' means 'Correct': A system can be consistent but still contain stale data if the last 'consistent' write was 10 minutes ago.
- Ignoring PACELC: Forgetting that even without a network failure, choosing strong consistency increases the latency for every single user.
- Over-Engineering for Strong Consistency: Trying to make a global, multi-region system strongly consistent. The speed of light makes this physically impossible without massive latency.
Recap
- CAP Theorem forces a choice between Consistency and Availability during a network partition.
- P is mandatory; you are either CP or AP.
- PACELC reminds us that we also trade Latency for Consistency during normal operations.
- Quorums are a common tool for tuning the balance between speed and correctness.
Knowledge Check
In the CAP theorem, what does 'Partition Tolerance' mean?