Software & Hardware Solutions to the Critical Section Problem
Master Peterson's two-process software solution and its formal proof, then explore how hardware-level atomic instructions — test_and_set and compare_and_swap — provide the foundation for all modern locking mechanisms.
Learning Goals
- Implement Peterson's Solution and formally prove it satisfies all three correctness requirements.
- Explain why software-only solutions break on modern CPUs due to instruction reordering.
- Distinguish between test_and_set and compare_and_swap, and implement mutual exclusion using each.
- Identify which hardware solution satisfies Bounded Waiting and which does not.
Peterson's Solution: The Classic Software Approach
Before hardware gave us atomic instructions, computer scientists sought pure software solutions to the Critical Section Problem. Peterson's Solution (1981) is the most elegant and historically important of these. It is a correct 2-process solution that uses only two shared variables and no special hardware support.
Assumption: Peterson's Solution assumes that the
LOADandSTOREmachine instructions are atomic — they cannot be interrupted mid-execution. This was true on older hardware but is not guaranteed on modern CPUs (explained at the end of this section).
The Two Shared Variables
The solution requires exactly two shared data structures, accessible by both processes and :
1int turn; // Whose turn it is to enter the Critical Section. 2 // turn = i means P_i may enter. turn = j means P_j may enter. 3 4boolean flag[2]; // Whether a process WANTS to enter the Critical Section. 5 // flag[i] = true means P_i is ready and wants to enter. 6 // Initialized to: flag[0] = false, flag[1] = false
The key insight is the separation of intent and permission:
flag[i]expresses desire — "I want to go in."turnexpresses deference — "But if there's a conflict, I'll let the other go first."
Full Implementation
The complete solution for process (where , i.e., the other process):
1// ── ENTRY SECTION ────────────────────────────────────────────────────────── 2flag[i] = true; // Step 1: Announce "I want to enter the CS." 3turn = j; // Step 2: Yield — "But if you also want in, go ahead." 4 5// Step 3: Wait while the other process BOTH wants to enter AND it's their turn. 6while (flag[j] == true && turn == j) { 7 // Busy-wait (spin). Do nothing. Keep checking. 8} 9 10// ── CRITICAL SECTION ─────────────────────────────────────────────────────── 11 /* ... access shared resource ... */ 12 13// ── EXIT SECTION ─────────────────────────────────────────────────────────── 14flag[i] = false; // Step 4: Announce "I am done. I no longer want the CS." 15 16// ── REMAINDER SECTION ────────────────────────────────────────────────────── 17 /* ... other non-critical work ... */
Why the Order of Steps 1 and 2 Matters
Notice that Step 1 sets flag[i] = true before Step 2 sets turn = j. This ordering is crucial:
- If does Step 2 first (
turn = j) and then gets interrupted, could seeflag[i] = falseandturn = jand immediately enter the CS. When resumes and setsflag[i] = true, both could be in the CS simultaneously — violating Mutual Exclusion. - By announcing intent first, the process guarantees that if a conflict occurs, the
turnvariable correctly resolves it.
Tracing Peterson's Solution: Both Processes Want In Simultaneously
- 1Step 1
flag[0] = false, flag[1] = false, turn = 0 (arbitrary). Neither process is in the CS. P0 and P1 are both in their Remainder Sections. Now, both decide to enter the CS at roughly the same time.
- 2Step 2
P0 executes
flag[0] = true— 'I want in.' Almost simultaneously (in the next few nanoseconds), P1 executesflag[1] = true— 'I also want in.' Now both flag values are true. There is a conflict. Theturnvariable will resolve it. - 3Step 3
P0 executes
turn = 1— 'I defer to P1 if there's a conflict.' Then P1 executesturn = 0— 'I defer to P0.' The final value ofturnis whichever was written LAST. Let's say P1 wrote last, soturn = 0. This means: 'P0 was the last to yield, so P1 gets priority.' Think of it as the last one to be polite wins. - 4Step 4
P0 checks its while condition:
flag[1] == true(YES) ANDturn == 1? NO — turn is 0, not 1. Wait...turn = 0means it is P0's turn to wait. Let's re-read: P0 checkswhile(flag[1] && turn == 1). Since turn = 0, NOT 1, this condition is FALSE. P0 EXITS the loop. P0 enters the Critical Section. ✅ - 5Step 5
P1 checks its while condition:
while(flag[0] == true && turn == 0). flag[0] = true (YES) and turn == 0 (YES). The condition is TRUE. P1 is stuck in the busy-wait loop. It keeps checking and checking, burning CPU cycles, but cannot enter. It must wait for P0 to finish. - 6Step 6
P0 finishes its Critical Section work and executes its Exit Section:
flag[0] = false. Now P1's while condition is re-evaluated:flag[0] == false— the first part is FALSE. The entire AND condition becomes FALSE. P1 exits the loop and immediately enters the Critical Section. The solution has correctly serialized access. ✅
Formal Proof: Why Peterson's Solution is Correct
An algorithm claiming to solve the CS Problem must be proven correct against all three requirements. Here is the formal proof for Peterson's Solution.
Proof 1: Mutual Exclusion ✅
Claim: and cannot both be executing in their Critical Sections simultaneously.
Proof by Contradiction: Suppose both and are in their Critical Sections at the same time.
For to have entered, its while loop condition must have been FALSE:
For to have entered, its while loop condition must have been FALSE:
But if both are in the CS, both must have set their flags to true before entering (Step 1 of Entry Section). So flag[i] = true AND flag[j] = true.
Given both flags are true, the only way the while loops could have been false is:
- exited because
turn == i(i.e., turn ≠ j) - exited because
turn == j(i.e., turn ≠ i)
But turn is a single integer — it cannot simultaneously equal i AND j. Contradiction. Therefore, Mutual Exclusion holds. ∎
Proof 2: Progress ✅
Claim: If no process is in its CS and processes wish to enter, the selection cannot be postponed indefinitely.
If only wants to enter: flag[j] = false. 's while condition is immediately FALSE (first term fails). enters without waiting. ✅
If both want to enter: The turn variable is either i or j — exactly one of them. Exactly one process's while condition will be FALSE. That process enters immediately. Selection completes in finite time. ✅
Proof 3: Bounded Waiting ✅
Claim: A process will enter the CS within a bounded number of turns after requesting entry.
Once sets flag[i] = true and turn = j, will enter at most once before gets a turn. Here's why: When finishes its CS, it sets flag[j] = false. 's while condition immediately becomes FALSE, and enters. waits at most 1 turn of . The bound is 1. ✅
The Critical Limitation: Modern Hardware Breaks Peterson's
Peterson's Solution is provably correct — on paper. But modern CPUs and compilers reorder instructions for performance optimization. A CPU might execute turn = j before flag[i] = true if it determines no data dependency exists between them. This reordering destroys the carefully crafted ordering that the proof relies on.
// What you wrote: // What the CPU might actually execute: flag[i] = true; turn = j; ← reordered! turn = j; flag[i] = true; ← reordered!
This is why hardware-level atomic instructions were invented — to give programmers guarantees that software logic alone cannot provide.
Hardware Solution 1: Disabling Interrupts
The simplest hardware approach: prevent the CPU from being interrupted while inside the Critical Section.
1// ENTRY SECTION 2disable_interrupts(); // Block all timer and I/O interrupts 3 4// CRITICAL SECTION 5 /* ... access shared resource ... */ 6 7// EXIT SECTION 8enable_interrupts(); // Re-enable interrupts
Why it works on uniprocessors: Without interrupts, the OS cannot trigger a context switch. The current process runs to completion inside the CS with no interference.
Why it FAILS on multiprocessors:
Disabling interrupts on CPU 1 has zero effect on CPU 2. on CPU 2 can freely enter the same Critical Section while is inside it. On a modern multicore machine, this approach provides no mutual exclusion at all.
Additional concern: A bug in the CS that causes an infinite loop will freeze the entire system because interrupts are disabled — the OS scheduler can never regain control.
Hardware Solution 2: The test_and_set() Instruction
Modern CPUs provide special instructions that perform a read-modify-write operation atomically — as a single, uninterruptible unit, visible to all processors simultaneously.
TestAndSet — Definition (Atomic):
1// This entire function executes as ONE indivisible hardware instruction. 2boolean TestAndSet(boolean *target) { 3 boolean rv = *target; // Read the current value 4 *target = true; // Set it to true (always) 5 return rv; // Return the ORIGINAL value 6}
Implementation of Mutual Exclusion using TestAndSet:
1// Shared: boolean lock = false; (initialized to false = "unlocked") 2 3do { 4 // ENTRY SECTION: Spin until we can acquire the lock 5 while (TestAndSet(&lock) == true) { 6 // lock was true (someone else holds it) → keep spinning 7 } 8 // TestAndSet returned false → lock was free, and we've now set it to true 9 10 // CRITICAL SECTION 11 /* ... access shared resource ... */ 12 13 // EXIT SECTION 14 lock = false; // Release the lock 15 16 // REMAINDER SECTION 17 /* ... */ 18 19} while (true);
How it prevents race conditions: Even if two processes call TestAndSet(&lock) simultaneously, the hardware guarantees that one will see the original value of false (and acquire the lock) while the other sees true (and spins). They cannot both see false.
Requirement Analysis:
| Requirement | Status | Reason |
|---|---|---|
| Mutual Exclusion | ✅ Satisfied | Only one TestAndSet can return false |
| Progress | ✅ Satisfied | When lock is released, a spinning process will acquire it |
| Bounded Waiting | ❌ NOT Satisfied | No guarantee of order — a process can spin indefinitely while others keep acquiring the lock |
Hardware Solution 3: compare_and_swap() with Bounded Waiting
To fix the starvation problem of TestAndSet, we use a more sophisticated atomic instruction and a waiting array to enforce order.
CompareAndSwap — Definition (Atomic):
1// Atomically: if *value equals expected, replace it with new_value. 2// Always returns the original value of *value. 3int CompareAndSwap(int *value, int expected, int new_value) { 4 int temp = *value; 5 if (*value == expected) { 6 *value = new_value; 7 } 8 return temp; 9}
Bounded-Waiting Mutual Exclusion using CompareAndSwap:
1// Shared: 2int lock = 0; // 0 = free, 1 = held 3boolean waiting[n]; // waiting[i] = true if P_i is waiting to enter 4// All waiting[] initialized to false 5 6do { 7 // ENTRY SECTION 8 waiting[i] = true; 9 int key = 1; 10 11 // Spin until either: (a) we grab the lock, or (b) another process wakes us 12 while (waiting[i] == true && key == 1) { 13 key = CompareAndSwap(&lock, 0, 1); 14 // If lock was 0 (free): CAS sets it to 1 and returns 0 → key=0 → exit loop 15 // If lock was 1 (held): CAS does nothing and returns 1 → key=1 → keep spinning 16 } 17 waiting[i] = false; // We are no longer waiting; we are about to enter 18 19 // CRITICAL SECTION 20 /* ... access shared resource ... */ 21 22 // EXIT SECTION — Find the next waiting process in circular order 23 int j = (i + 1) % n; 24 while (j != i && waiting[j] == false) { 25 j = (j + 1) % n; 26 } 27 28 if (j == i) { 29 lock = 0; // No one waiting → simply release the lock 30 } else { 31 waiting[j] = false; // Wake up P_j directly (hand off without releasing lock) 32 } 33 34 // REMAINDER SECTION 35} while (true);
Why this achieves Bounded Waiting: The exit section performs a circular scan — it finds the next waiting process (in index order: i+1, i+2, ..., wrapping around). A process waiting at index will be woken up within at most turns of other processes entering and exiting the CS. The bound is . ✅
Knowledge Check
In Peterson's Solution, both P0 and P1 set their flags to true. P0 then sets turn = 1, and immediately after, P1 sets turn = 0. Which process is allowed to enter the Critical Section?