Software & Hardware Solutions to the Critical Section Problem

60 mins

Master Peterson's two-process software solution and its formal proof, then explore how hardware-level atomic instructions — test_and_set and compare_and_swap — provide the foundation for all modern locking mechanisms.

Learning Goals

Implement Peterson's Solution and formally prove it satisfies all three correctness requirements.
Explain why software-only solutions break on modern CPUs due to instruction reordering.
Distinguish between test_and_set and compare_and_swap, and implement mutual exclusion using each.
Identify which hardware solution satisfies Bounded Waiting and which does not.

Peterson's Solution: The Classic Software Approach

Before hardware gave us atomic instructions, computer scientists sought pure software solutions to the Critical Section Problem. Peterson's Solution (1981) is the most elegant and historically important of these. It is a correct 2-process solution that uses only two shared variables and no special hardware support.

Assumption: Peterson's Solution assumes that the LOAD and STORE machine instructions are atomic — they cannot be interrupted mid-execution. This was true on older hardware but is not guaranteed on modern CPUs (explained at the end of this section).

The Two Shared Variables

The solution requires exactly two shared data structures, accessible by both processes $P_i$ and $P_j$ :

1int turn;           // Whose turn it is to enter the Critical Section.
2                    // turn = i means P_i may enter. turn = j means P_j may enter.
3
4boolean flag[2];    // Whether a process WANTS to enter the Critical Section.
5                    // flag[i] = true means P_i is ready and wants to enter.
6                    // Initialized to: flag[0] = false, flag[1] = false

The key insight is the separation of intent and permission:

flag[i] expresses desire — "I want to go in."
turn expresses deference — "But if there's a conflict, I'll let the other go first."

Full Implementation

The complete solution for process $P_i$ (where $j = 1 - i$ , i.e., the other process):

1// ── ENTRY SECTION ──────────────────────────────────────────────────────────
2flag[i] = true;     // Step 1: Announce "I want to enter the CS."
3turn = j;           // Step 2: Yield — "But if you also want in, go ahead."
4
5// Step 3: Wait while the other process BOTH wants to enter AND it's their turn.
6while (flag[j] == true && turn == j) {
7    // Busy-wait (spin). Do nothing. Keep checking.
8}
9
10// ── CRITICAL SECTION ───────────────────────────────────────────────────────
11    /* ... access shared resource ... */
12
13// ── EXIT SECTION ───────────────────────────────────────────────────────────
14flag[i] = false;    // Step 4: Announce "I am done. I no longer want the CS."
15
16// ── REMAINDER SECTION ──────────────────────────────────────────────────────
17    /* ... other non-critical work ... */

Why the Order of Steps 1 and 2 Matters

Notice that Step 1 sets flag[i] = true before Step 2 sets turn = j. This ordering is crucial:

If $P_i$ does Step 2 first (turn = j) and then gets interrupted, $P_j$ could see flag[i] = false and turn = j and immediately enter the CS. When $P_i$ resumes and sets flag[i] = true, both could be in the CS simultaneously — violating Mutual Exclusion.
By announcing intent first, the process guarantees that if a conflict occurs, the turn variable correctly resolves it.

Tracing Peterson's Solution: Both Processes Want In Simultaneously

1
Step 1
flag[0] = false, flag[1] = false, turn = 0 (arbitrary). Neither process is in the CS. P0 and P1 are both in their Remainder Sections. Now, both decide to enter the CS at roughly the same time.
2
Step 2
P0 executes flag[0] = true — 'I want in.' Almost simultaneously (in the next few nanoseconds), P1 executes flag[1] = true — 'I also want in.' Now both flag values are true. There is a conflict. The turn variable will resolve it.
3
Step 3
P0 executes turn = 1 — 'I defer to P1 if there's a conflict.' Then P1 executes turn = 0 — 'I defer to P0.' The final value of turn is whichever was written LAST. Let's say P1 wrote last, so turn = 0. This means: 'P0 was the last to yield, so P1 gets priority.' Think of it as the last one to be polite wins.
4
Step 4
P0 checks its while condition: flag[1] == true (YES) AND turn == 1? NO — turn is 0, not 1. Wait... turn = 0 means it is P0's turn to wait. Let's re-read: P0 checks while(flag[1] && turn == 1). Since turn = 0, NOT 1, this condition is FALSE. P0 EXITS the loop. P0 enters the Critical Section. ✅
5
Step 5
P1 checks its while condition: while(flag[0] == true && turn == 0). flag[0] = true (YES) and turn == 0 (YES). The condition is TRUE. P1 is stuck in the busy-wait loop. It keeps checking and checking, burning CPU cycles, but cannot enter. It must wait for P0 to finish.
6
Step 6
P0 finishes its Critical Section work and executes its Exit Section: flag[0] = false. Now P1's while condition is re-evaluated: flag[0] == false — the first part is FALSE. The entire AND condition becomes FALSE. P1 exits the loop and immediately enters the Critical Section. The solution has correctly serialized access. ✅

Formal Proof: Why Peterson's Solution is Correct

An algorithm claiming to solve the CS Problem must be proven correct against all three requirements. Here is the formal proof for Peterson's Solution.

Proof 1: Mutual Exclusion ✅

Claim: $P_i$ and $P_j$ cannot both be executing in their Critical Sections simultaneously.

Proof by Contradiction: Suppose both $P_i$ and $P_j$ are in their Critical Sections at the same time.

For $P_i$ to have entered, its while loop condition must have been FALSE: $\text{NOT}(\text{flag}[j] == true \text{ AND } turn == j) \implies \text{flag}[j] == false \text{ OR } turn == i$

For $P_j$ to have entered, its while loop condition must have been FALSE: $\text{NOT}(\text{flag}[i] == true \text{ AND } turn == i) \implies \text{flag}[i] == false \text{ OR } turn == j$

But if both are in the CS, both must have set their flags to true before entering (Step 1 of Entry Section). So flag[i] = true AND flag[j] = true.

Given both flags are true, the only way the while loops could have been false is:

$P_i$ exited because turn == i (i.e., turn ≠ j)
$P_j$ exited because turn == j (i.e., turn ≠ i)

But turn is a single integer — it cannot simultaneously equal i AND j. Contradiction. Therefore, Mutual Exclusion holds. ∎

Proof 2: Progress ✅

Claim: If no process is in its CS and processes wish to enter, the selection cannot be postponed indefinitely.

If only $P_i$ wants to enter: flag[j] = false. $P_i$ 's while condition is immediately FALSE (first term fails). $P_i$ enters without waiting. ✅

If both want to enter: The turn variable is either i or j — exactly one of them. Exactly one process's while condition will be FALSE. That process enters immediately. Selection completes in finite time. ✅

Proof 3: Bounded Waiting ✅

Claim: A process will enter the CS within a bounded number of turns after requesting entry.

Once $P_i$ sets flag[i] = true and turn = j, $P_j$ will enter at most once before $P_i$ gets a turn. Here's why: When $P_j$ finishes its CS, it sets flag[j] = false. $P_i$ 's while condition immediately becomes FALSE, and $P_i$ enters. $P_i$ waits at most 1 turn of $P_j$ . The bound is 1. ✅

The Critical Limitation: Modern Hardware Breaks Peterson's

Peterson's Solution is provably correct — on paper. But modern CPUs and compilers reorder instructions for performance optimization. A CPU might execute turn = j before flag[i] = true if it determines no data dependency exists between them. This reordering destroys the carefully crafted ordering that the proof relies on.

// What you wrote:             // What the CPU might actually execute:
flag[i] = true;                turn = j;          ← reordered!
turn = j;                      flag[i] = true;    ← reordered!

This is why hardware-level atomic instructions were invented — to give programmers guarantees that software logic alone cannot provide.

Hardware Solution 1: Disabling Interrupts

The simplest hardware approach: prevent the CPU from being interrupted while inside the Critical Section.

1// ENTRY SECTION
2disable_interrupts();    // Block all timer and I/O interrupts
3
4// CRITICAL SECTION
5    /* ... access shared resource ... */
6
7// EXIT SECTION
8enable_interrupts();     // Re-enable interrupts

Why it works on uniprocessors: Without interrupts, the OS cannot trigger a context switch. The current process runs to completion inside the CS with no interference.

Why it FAILS on multiprocessors:

Disabling interrupts on CPU 1 has zero effect on CPU 2. $P_2$ on CPU 2 can freely enter the same Critical Section while $P_1$ is inside it. On a modern multicore machine, this approach provides no mutual exclusion at all.

Additional concern: A bug in the CS that causes an infinite loop will freeze the entire system because interrupts are disabled — the OS scheduler can never regain control.

Hardware Solution 2: The `test_and_set()` Instruction

Modern CPUs provide special instructions that perform a read-modify-write operation atomically — as a single, uninterruptible unit, visible to all processors simultaneously.

TestAndSet — Definition (Atomic):

1// This entire function executes as ONE indivisible hardware instruction.
2boolean TestAndSet(boolean *target) {
3    boolean rv = *target;    // Read the current value
4    *target = true;          // Set it to true (always)
5    return rv;               // Return the ORIGINAL value
6}

Implementation of Mutual Exclusion using TestAndSet:

1// Shared: boolean lock = false;  (initialized to false = "unlocked")
2
3do {
4    // ENTRY SECTION: Spin until we can acquire the lock
5    while (TestAndSet(&lock) == true) {
6        // lock was true (someone else holds it) → keep spinning
7    }
8    // TestAndSet returned false → lock was free, and we've now set it to true
9
10    // CRITICAL SECTION
11        /* ... access shared resource ... */
12
13    // EXIT SECTION
14    lock = false;    // Release the lock
15
16    // REMAINDER SECTION
17        /* ... */
18
19} while (true);

How it prevents race conditions: Even if two processes call TestAndSet(&lock) simultaneously, the hardware guarantees that one will see the original value of false (and acquire the lock) while the other sees true (and spins). They cannot both see false.

Requirement Analysis:

Requirement	Status	Reason
Mutual Exclusion	✅ Satisfied	Only one `TestAndSet` can return `false`
Progress	✅ Satisfied	When lock is released, a spinning process will acquire it
Bounded Waiting	❌ NOT Satisfied	No guarantee of order — a process can spin indefinitely while others keep acquiring the lock

Hardware Solution 3: `compare_and_swap()` with Bounded Waiting

To fix the starvation problem of TestAndSet, we use a more sophisticated atomic instruction and a waiting array to enforce order.

CompareAndSwap — Definition (Atomic):

1// Atomically: if *value equals expected, replace it with new_value.
2// Always returns the original value of *value.
3int CompareAndSwap(int *value, int expected, int new_value) {
4    int temp = *value;
5    if (*value == expected) {
6        *value = new_value;
7    }
8    return temp;
9}

Bounded-Waiting Mutual Exclusion using CompareAndSwap:

1// Shared:
2int lock = 0;                  // 0 = free, 1 = held
3boolean waiting[n];            // waiting[i] = true if P_i is waiting to enter
4// All waiting[] initialized to false
5
6do {
7    // ENTRY SECTION
8    waiting[i] = true;
9    int key = 1;
10
11    // Spin until either: (a) we grab the lock, or (b) another process wakes us
12    while (waiting[i] == true && key == 1) {
13        key = CompareAndSwap(&lock, 0, 1);
14        // If lock was 0 (free): CAS sets it to 1 and returns 0 → key=0 → exit loop
15        // If lock was 1 (held): CAS does nothing and returns 1 → key=1 → keep spinning
16    }
17    waiting[i] = false;    // We are no longer waiting; we are about to enter
18
19    // CRITICAL SECTION
20        /* ... access shared resource ... */
21
22    // EXIT SECTION — Find the next waiting process in circular order
23    int j = (i + 1) % n;
24    while (j != i && waiting[j] == false) {
25        j = (j + 1) % n;
26    }
27
28    if (j == i) {
29        lock = 0;          // No one waiting → simply release the lock
30    } else {
31        waiting[j] = false; // Wake up P_j directly (hand off without releasing lock)
32    }
33
34    // REMAINDER SECTION
35} while (true);

Why this achieves Bounded Waiting: The exit section performs a circular scan — it finds the next waiting process (in index order: i+1, i+2, ..., wrapping around). A process waiting at index $i$ will be woken up within at most $n-1$ turns of other processes entering and exiting the CS. The bound is $n - 1$ . ✅

Knowledge Check

Question 1 of 3

Q1Single choice

In Peterson's Solution, both P0 and P1 set their flags to true. P0 then sets turn = 1, and immediately after, P1 sets turn = 0. Which process is allowed to enter the Critical Section?

P0, because it set its flag first.

P1, because turn = 0 means it is P1's turn to enter.

P0, because turn = 0 means P0 is NOT being deferred — P1 wrote last and yielded to P0.

Neither — both will spin indefinitely in a deadlock.

OSTEP: Chapter 28 — Locks

pdf

The Synchronization Problem: Race Conditions and Mutual Exclusion

High-Level Synchronization: Semaphores and Monitors