Coursify

Operating System

Software & Hardware Solutions to the Critical Section Problem

60 mins

Master Peterson's two-process software solution and its formal proof, then explore how hardware-level atomic instructions — test_and_set and compare_and_swap — provide the foundation for all modern locking mechanisms.

Learning Goals

  • Implement Peterson's Solution and formally prove it satisfies all three correctness requirements.
  • Explain why software-only solutions break on modern CPUs due to instruction reordering.
  • Distinguish between test_and_set and compare_and_swap, and implement mutual exclusion using each.
  • Identify which hardware solution satisfies Bounded Waiting and which does not.

Peterson's Solution: The Classic Software Approach

Before hardware gave us atomic instructions, computer scientists sought pure software solutions to the Critical Section Problem. Peterson's Solution (1981) is the most elegant and historically important of these. It is a correct 2-process solution that uses only two shared variables and no special hardware support.

Assumption: Peterson's Solution assumes that the LOAD and STORE machine instructions are atomic — they cannot be interrupted mid-execution. This was true on older hardware but is not guaranteed on modern CPUs (explained at the end of this section).

The Two Shared Variables

The solution requires exactly two shared data structures, accessible by both processes PiP_i and PjP_j:

1int turn; // Whose turn it is to enter the Critical Section. 2 // turn = i means P_i may enter. turn = j means P_j may enter. 3 4boolean flag[2]; // Whether a process WANTS to enter the Critical Section. 5 // flag[i] = true means P_i is ready and wants to enter. 6 // Initialized to: flag[0] = false, flag[1] = false

The key insight is the separation of intent and permission:

  • flag[i] expresses desire — "I want to go in."
  • turn expresses deference — "But if there's a conflict, I'll let the other go first."

Full Implementation

The complete solution for process PiP_i (where j=1ij = 1 - i, i.e., the other process):

1// ── ENTRY SECTION ────────────────────────────────────────────────────────── 2flag[i] = true; // Step 1: Announce "I want to enter the CS." 3turn = j; // Step 2: Yield — "But if you also want in, go ahead." 4 5// Step 3: Wait while the other process BOTH wants to enter AND it's their turn. 6while (flag[j] == true && turn == j) { 7 // Busy-wait (spin). Do nothing. Keep checking. 8} 9 10// ── CRITICAL SECTION ─────────────────────────────────────────────────────── 11 /* ... access shared resource ... */ 12 13// ── EXIT SECTION ─────────────────────────────────────────────────────────── 14flag[i] = false; // Step 4: Announce "I am done. I no longer want the CS." 15 16// ── REMAINDER SECTION ────────────────────────────────────────────────────── 17 /* ... other non-critical work ... */

Why the Order of Steps 1 and 2 Matters

Notice that Step 1 sets flag[i] = true before Step 2 sets turn = j. This ordering is crucial:

  • If PiP_i does Step 2 first (turn = j) and then gets interrupted, PjP_j could see flag[i] = false and turn = j and immediately enter the CS. When PiP_i resumes and sets flag[i] = true, both could be in the CS simultaneously — violating Mutual Exclusion.
  • By announcing intent first, the process guarantees that if a conflict occurs, the turn variable correctly resolves it.

Tracing Peterson's Solution: Both Processes Want In Simultaneously

  1. 1
    Step 1

    flag[0] = false, flag[1] = false, turn = 0 (arbitrary). Neither process is in the CS. P0 and P1 are both in their Remainder Sections. Now, both decide to enter the CS at roughly the same time.

  2. 2
    Step 2

    P0 executes flag[0] = true — 'I want in.' Almost simultaneously (in the next few nanoseconds), P1 executes flag[1] = true — 'I also want in.' Now both flag values are true. There is a conflict. The turn variable will resolve it.

  3. 3
    Step 3

    P0 executes turn = 1 — 'I defer to P1 if there's a conflict.' Then P1 executes turn = 0 — 'I defer to P0.' The final value of turn is whichever was written LAST. Let's say P1 wrote last, so turn = 0. This means: 'P0 was the last to yield, so P1 gets priority.' Think of it as the last one to be polite wins.

  4. 4
    Step 4

    P0 checks its while condition: flag[1] == true (YES) AND turn == 1? NO — turn is 0, not 1. Wait... turn = 0 means it is P0's turn to wait. Let's re-read: P0 checks while(flag[1] && turn == 1). Since turn = 0, NOT 1, this condition is FALSE. P0 EXITS the loop. P0 enters the Critical Section. ✅

  5. 5
    Step 5

    P1 checks its while condition: while(flag[0] == true && turn == 0). flag[0] = true (YES) and turn == 0 (YES). The condition is TRUE. P1 is stuck in the busy-wait loop. It keeps checking and checking, burning CPU cycles, but cannot enter. It must wait for P0 to finish.

  6. 6
    Step 6

    P0 finishes its Critical Section work and executes its Exit Section: flag[0] = false. Now P1's while condition is re-evaluated: flag[0] == false — the first part is FALSE. The entire AND condition becomes FALSE. P1 exits the loop and immediately enters the Critical Section. The solution has correctly serialized access. ✅

Formal Proof: Why Peterson's Solution is Correct

An algorithm claiming to solve the CS Problem must be proven correct against all three requirements. Here is the formal proof for Peterson's Solution.


Proof 1: Mutual Exclusion ✅

Claim: PiP_i and PjP_j cannot both be executing in their Critical Sections simultaneously.

Proof by Contradiction: Suppose both PiP_i and PjP_j are in their Critical Sections at the same time.

For PiP_i to have entered, its while loop condition must have been FALSE: NOT(flag[j]==true AND turn==j)    flag[j]==false OR turn==i\text{NOT}(\text{flag}[j] == true \text{ AND } turn == j) \implies \text{flag}[j] == false \text{ OR } turn == i

For PjP_j to have entered, its while loop condition must have been FALSE: NOT(flag[i]==true AND turn==i)    flag[i]==false OR turn==j\text{NOT}(\text{flag}[i] == true \text{ AND } turn == i) \implies \text{flag}[i] == false \text{ OR } turn == j

But if both are in the CS, both must have set their flags to true before entering (Step 1 of Entry Section). So flag[i] = true AND flag[j] = true.

Given both flags are true, the only way the while loops could have been false is:

  • PiP_i exited because turn == i (i.e., turn ≠ j)
  • PjP_j exited because turn == j (i.e., turn ≠ i)

But turn is a single integer — it cannot simultaneously equal i AND j. Contradiction. Therefore, Mutual Exclusion holds. ∎


Proof 2: Progress ✅

Claim: If no process is in its CS and processes wish to enter, the selection cannot be postponed indefinitely.

If only PiP_i wants to enter: flag[j] = false. PiP_i's while condition is immediately FALSE (first term fails). PiP_i enters without waiting. ✅

If both want to enter: The turn variable is either i or j — exactly one of them. Exactly one process's while condition will be FALSE. That process enters immediately. Selection completes in finite time. ✅


Proof 3: Bounded Waiting ✅

Claim: A process will enter the CS within a bounded number of turns after requesting entry.

Once PiP_i sets flag[i] = true and turn = j, PjP_j will enter at most once before PiP_i gets a turn. Here's why: When PjP_j finishes its CS, it sets flag[j] = false. PiP_i's while condition immediately becomes FALSE, and PiP_i enters. PiP_i waits at most 1 turn of PjP_j. The bound is 1. ✅


The Critical Limitation: Modern Hardware Breaks Peterson's

Peterson's Solution is provably correct — on paper. But modern CPUs and compilers reorder instructions for performance optimization. A CPU might execute turn = j before flag[i] = true if it determines no data dependency exists between them. This reordering destroys the carefully crafted ordering that the proof relies on.

// What you wrote:             // What the CPU might actually execute:
flag[i] = true;                turn = j;          ← reordered!
turn = j;                      flag[i] = true;    ← reordered!

This is why hardware-level atomic instructions were invented — to give programmers guarantees that software logic alone cannot provide.


Hardware Solution 1: Disabling Interrupts

The simplest hardware approach: prevent the CPU from being interrupted while inside the Critical Section.

1// ENTRY SECTION 2disable_interrupts(); // Block all timer and I/O interrupts 3 4// CRITICAL SECTION 5 /* ... access shared resource ... */ 6 7// EXIT SECTION 8enable_interrupts(); // Re-enable interrupts

Why it works on uniprocessors: Without interrupts, the OS cannot trigger a context switch. The current process runs to completion inside the CS with no interference.

Why it FAILS on multiprocessors:

Disabling interrupts on CPU 1 has zero effect on CPU 2. P2P_2 on CPU 2 can freely enter the same Critical Section while P1P_1 is inside it. On a modern multicore machine, this approach provides no mutual exclusion at all.

Additional concern: A bug in the CS that causes an infinite loop will freeze the entire system because interrupts are disabled — the OS scheduler can never regain control.


Hardware Solution 2: The test_and_set() Instruction

Modern CPUs provide special instructions that perform a read-modify-write operation atomically — as a single, uninterruptible unit, visible to all processors simultaneously.

TestAndSet — Definition (Atomic):

1// This entire function executes as ONE indivisible hardware instruction. 2boolean TestAndSet(boolean *target) { 3 boolean rv = *target; // Read the current value 4 *target = true; // Set it to true (always) 5 return rv; // Return the ORIGINAL value 6}

Implementation of Mutual Exclusion using TestAndSet:

1// Shared: boolean lock = false; (initialized to false = "unlocked") 2 3do { 4 // ENTRY SECTION: Spin until we can acquire the lock 5 while (TestAndSet(&lock) == true) { 6 // lock was true (someone else holds it) → keep spinning 7 } 8 // TestAndSet returned false → lock was free, and we've now set it to true 9 10 // CRITICAL SECTION 11 /* ... access shared resource ... */ 12 13 // EXIT SECTION 14 lock = false; // Release the lock 15 16 // REMAINDER SECTION 17 /* ... */ 18 19} while (true);

How it prevents race conditions: Even if two processes call TestAndSet(&lock) simultaneously, the hardware guarantees that one will see the original value of false (and acquire the lock) while the other sees true (and spins). They cannot both see false.

Requirement Analysis:

RequirementStatusReason
Mutual Exclusion✅ SatisfiedOnly one TestAndSet can return false
Progress✅ SatisfiedWhen lock is released, a spinning process will acquire it
Bounded WaitingNOT SatisfiedNo guarantee of order — a process can spin indefinitely while others keep acquiring the lock

Hardware Solution 3: compare_and_swap() with Bounded Waiting

To fix the starvation problem of TestAndSet, we use a more sophisticated atomic instruction and a waiting array to enforce order.

CompareAndSwap — Definition (Atomic):

1// Atomically: if *value equals expected, replace it with new_value. 2// Always returns the original value of *value. 3int CompareAndSwap(int *value, int expected, int new_value) { 4 int temp = *value; 5 if (*value == expected) { 6 *value = new_value; 7 } 8 return temp; 9}

Bounded-Waiting Mutual Exclusion using CompareAndSwap:

1// Shared: 2int lock = 0; // 0 = free, 1 = held 3boolean waiting[n]; // waiting[i] = true if P_i is waiting to enter 4// All waiting[] initialized to false 5 6do { 7 // ENTRY SECTION 8 waiting[i] = true; 9 int key = 1; 10 11 // Spin until either: (a) we grab the lock, or (b) another process wakes us 12 while (waiting[i] == true && key == 1) { 13 key = CompareAndSwap(&lock, 0, 1); 14 // If lock was 0 (free): CAS sets it to 1 and returns 0 → key=0 → exit loop 15 // If lock was 1 (held): CAS does nothing and returns 1 → key=1 → keep spinning 16 } 17 waiting[i] = false; // We are no longer waiting; we are about to enter 18 19 // CRITICAL SECTION 20 /* ... access shared resource ... */ 21 22 // EXIT SECTION — Find the next waiting process in circular order 23 int j = (i + 1) % n; 24 while (j != i && waiting[j] == false) { 25 j = (j + 1) % n; 26 } 27 28 if (j == i) { 29 lock = 0; // No one waiting → simply release the lock 30 } else { 31 waiting[j] = false; // Wake up P_j directly (hand off without releasing lock) 32 } 33 34 // REMAINDER SECTION 35} while (true);

Why this achieves Bounded Waiting: The exit section performs a circular scan — it finds the next waiting process (in index order: i+1, i+2, ..., wrapping around). A process waiting at index ii will be woken up within at most n1n-1 turns of other processes entering and exiting the CS. The bound is n1n - 1. ✅

Knowledge Check

Question 1 of 3
Q1Single choice

In Peterson's Solution, both P0 and P1 set their flags to true. P0 then sets turn = 1, and immediately after, P1 sets turn = 0. Which process is allowed to enter the Critical Section?

Software & Hardware Solutions to the Critical Section Problem | Operating System | Coursify