Coursify

System Design for Software Engineers

The CAP Theorem Revisited

The CAP Theorem Revisited

In the world of distributed databases, you cannot have everything. The CAP Theorem, first proposed by Eric Brewer, states that a distributed system can only provide two out of the following three guarantees at any given time:

  1. Consistency (C): Every read receives the most recent write or an error.
  2. Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  3. Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

The "P" is Mandatory

In a distributed system, network partitions (P) are inevitable. Therefore, the choice isn't between C, A, and P; it's between Consistency (CP) or Availability (AP) when a partition occurs.

  • CP (Consistency + Partition Tolerance): If a partition occurs, the system will refuse to accept writes or return errors for reads until the partition is resolved, ensuring data remains consistent. Examples: MongoDB (in default config), HBase, CockroachDB.
  • AP (Availability + Partition Tolerance): If a partition occurs, the nodes will continue to accept reads and writes, even if they can't talk to each other. This leads to "Eventual Consistency." Examples: Cassandra, DynamoDB, CouchDB.

PACELC: The Modern Extension

The CAP theorem only describes what happens when there is a network partition. The PACELC theorem extends this by describing what happens during normal operation (when there is no partition):

  • Partitioned: choose between Availability or Consistency.
  • Else (no partition): choose between Latency or Consistency.

This explains why even when the network is fine, some databases (like Cassandra) still choose lower consistency to provide lower latency.

Selecting a Consistency Model

  1. 1
    Step 1

    Is data correctness more important than the system being online? For a bank transfer, choose Strong Consistency (CP). For a social media 'like' count, choose Availability (AP).

  2. 2
    Step 2

    Does the user expect a response in under 50ms? If so, you may need to choose Eventual Consistency (EL) to avoid the overhead of cross-node coordination, even when the network is healthy.

  3. 3
    Step 3

    Many NoSQL databases allow you to tune consistency per request:

    • ONE: Success if 1 node responds (Fastest, least consistent).
    • QUORUM: Success if a majority (n/2 + 1) of nodes respond (Balanced).
    • ALL: Success only if all replicas respond (Slowest, most consistent).
  4. 4
    Step 4

    If you choose an AP system, ensure that the user who just made a change always sees their own update. This can be done by routing their next read to the same node they wrote to, or using a session-based consistency token.

Types of Consistency

  1. Strong Consistency: After a write, any subsequent read will return that value.
  2. Eventual Consistency: If no new updates are made, eventually all reads will return the same value.
  3. Monotonic Reads: If a user has seen a certain value, they will never see an older value in the future.
  4. Read-Your-Writes: A user will always see the updates they submitted themselves, even if other users haven't seen them yet.

Common Mistakes

  • Assuming 'Consistent' means 'Correct': A system can be consistent but still contain stale data if the last 'consistent' write was 10 minutes ago.
  • Ignoring PACELC: Forgetting that even without a network failure, choosing strong consistency increases the latency for every single user.
  • Over-Engineering for Strong Consistency: Trying to make a global, multi-region system strongly consistent. The speed of light makes this physically impossible without massive latency.

Recap

  • CAP Theorem forces a choice between Consistency and Availability during a network partition.
  • P is mandatory; you are either CP or AP.
  • PACELC reminds us that we also trade Latency for Consistency during normal operations.
  • Quorums are a common tool for tuning the balance between speed and correctness.

Knowledge Check

Question 1 of 3
Q1Single choice

In the CAP theorem, what does 'Partition Tolerance' mean?