Coursify

Operating System

Detection and Recovery

45 mins

Learning Goals

  • Construct a Wait-for Graph from a Resource-Allocation Graph for single-instance resource systems.
  • Apply the Banker's Detection Algorithm to detect deadlocks in multi-instance resource systems.
  • Analyze the trade-offs of different detection frequencies (periodic vs. on-demand).
  • Evaluate recovery strategies: process termination (abort all vs. abort one-by-one) and resource preemption (victim selection, rollback, starvation prevention).

Deadlock Detection: An Overview

Deadlock detection is the process of determining whether a deadlock currently exists in the system. Unlike prevention and avoidance, detection allows the system to enter deadlock states — but then detects them and takes corrective action.

When to use detection:

  • Prevention is too costly or restrictive.
  • Avoidance is impossible because maximum resource needs are unknown.
  • The system is a general-purpose OS (like Linux or Windows) with unpredictable workloads.

Detection methods depend on resource type:

Resource TypeDetection MethodComplexity
Single-instance (each resource type has exactly 1 instance)Wait-for Graph (WFG) — check for cyclesO(n²) using DFS
Multi-instance (each resource type can have multiple instances)Banker's Detection Algorithm — matrix reductionO(m × n²)

Detection for Single-Instance Resources: Wait-for Graph

For systems where each resource type has exactly one instance, a deadlock can be detected by constructing a Wait-for Graph (WFG) from the Resource-Allocation Graph (RAG).

Transforming RAG to WFG

The WFG is derived from the RAG by removing resource nodes and connecting processes directly:

  1. For each pair (Pᵢ, Rₖ, Pⱼ) where Pᵢ is waiting for Rₖ (request edge Pᵢ → Rₖ) and Rₖ is allocated to Pⱼ (assignment edge Rₖ → Pⱼ), draw a wait-for edge from Pᵢ → Pⱼ.
  2. Remove all resource nodes from the graph.
  3. The resulting graph has only process nodes and directed edges representing "is waiting for".

Before (RAG): The Resource-Allocation Graph shows processes connected through resources.

After (WFG): The Wait-for Graph removes resource nodes and connects processes directly.

The cycle P₁ → P₂ → P₁ in the WFG indicates a deadlock.

Cycle Detection

Graph propertyMeaning
WFG has no cyclesNo deadlock exists.
WFG contains one or more cyclesDeadlock exists. Every process in the cycle is deadlocked.

Example Transformation:

RAG EdgesWFG EdgeMeaning
P₁ → R₂ (request) + R₂ → P₂ (assignment)P₁ → P₂P₁ is waiting for P₂
P₂ → R₁ (request) + R₁ → P₁ (assignment)P₂ → P₁P₂ is waiting for P₁
Result: Cycle P₁ → P₂ → P₁ — Deadlock detected.

Detection Algorithm (DFS-Based)

  1. For each process Pᵢ in the WFG, start a Depth-First Search (DFS).
  2. Mark nodes as: White (unvisited), Gray (in current DFS path), Black (fully explored).
  3. If during DFS from Pᵢ we encounter a node that is already Gray, a cycle (deadlock) has been detected.
  4. If no Gray node is encountered during any DFS, no deadlock exists.

Deadlock Detection and Recovery — Animated Explanation

Detection for Multi-Instance Resources: Banker's Detection Algorithm

When resource types have multiple instances, the Wait-for Graph approach is insufficient (a cycle does not guarantee deadlock). Instead, we use the Banker's Detection Algorithm, which is similar to the Safety Algorithm but uses current request information rather than maximum need.

Data Structures

StructureDescription
Available[m]Number of available instances of each resource type.
Allocation[n][m]Number of instances currently allocated to each process.
Request[n][m]Current outstanding request of each process (what each process is currently waiting for).

The Detection Algorithm

1. Let Work[m] = Available[m]   // Temporary copy
2. Let Finish[n] = false for all processes
3. For each process Pᵢ where Finish[i] == false:
      if Request[i] ≤ Work, then:
         Work = Work + Allocation[i]   // Pretend Pᵢ finishes and releases
         Finish[i] = true
         Go to step 3 and restart
4. After scanning all processes:
      if Finish[i] == false for any process, then Pᵢ is deadlocked.

Result Interpretation

ConditionMeaning
All Finish[i] == trueNo deadlock exists. The system can recover.
Some Finish[i] == falseThose processes are deadlocked. They will not be able to complete.

Example: 3 Processes, 2 Resources

Given:

  • Available = (0, 0)
  • Allocation = P₀: (0, 1), P₁: (2, 0), P₂: (3, 0)
  • Request = P₀: (0, 0), P₁: (2, 0), P₂: (0, 1)

Execution:

  1. Work = (0, 0). P₀: Request (0,0) ≤ (0,0) ✓ → P₀ finishes → Work = (0+0, 0+1) = (0, 1). Finish[0] = true.
  2. P₁: Request (2,0) ≤ (0,1) ✗ → P₁ blocked.
  3. P₂: Request (0,1) ≤ (0,1) ✓ → P₂ finishes → Work = (0+3, 1+0) = (3, 1). Finish[2] = true.
  4. P₁: Request (2,0) ≤ (3,1) ✓ → P₁ finishes → Work = (3+2, 1+0) = (5, 1). Finish[1] = true.
  5. All Finish[i] == true → No deadlock. Sequence: <P₀, P₂, P₁>.

PYQ Variation: Detection Using Allocation + Request Only (No Max) [2022 Q4b]

Some exam problems provide Allocation and Request matrices directly without a Max matrix. In this case, you use the same detection algorithm but treat Request as the current outstanding needs. Let's solve the 2022 PYQ.

Problem [2022 Q4b]: Single processor system, 3 resource types (X, Y, Z) with 5 units each, 3 processes. Find which process finishes first and last.

ProcessAllocation (X Y Z)Request (X Y Z)
P₀1 2 11 0 3
P₁2 0 10 1 2
P₂2 2 11 2 0

Step 1: Calculate Available = Total − Sum(Allocation) = (5,5,5) − (1+2+2, 2+0+2, 1+1+1) = (5,5,5) − (5,4,3) = (0, 1, 2).

Step 2: Apply Detection Algorithm. Work = (0, 1, 2).

  • P₁: Request (0,1,2) ≤ Work (0,1,2) ✓ → P₁ finishes → Work = (0+2, 1+0, 2+1) = (2, 1, 3). P₁ finishes first.
  • P₀: Request (1,0,3) ≤ Work (2,1,3) ✓ → P₀ finishes → Work = (2+1, 1+2, 3+1) = (3, 3, 4).
  • P₂: Request (1,2,0) ≤ Work (3,3,4) ✓ → P₂ finishes → Work = (3+2, 3+2, 4+1) = (5, 5, 5). P₂ finishes last.

Result: All processes complete. Safe sequence: <P₁, P₀, P₂>. P₁ finishes first, P₂ finishes last.

Constructing a Wait-for Graph and Detecting Deadlocks

  1. 1
    Step 1

    Given a system with 4 processes (P₁, P₂, P₃, P₄) and 4 resource types (R₁, R₂, R₃, R₄) each with one instance. RAG edges: P₁ requests R₁, R₁ assigned to P₂. P₂ requests R₂, R₂ assigned to P₃. P₃ requests R₃, R₃ assigned to P₄. P₄ requests R₄, R₄ assigned to P₁. P₂ requests R₁, R₁ assigned to P₂ (self-loop — P₂ wants R₁ which it already has? Actually this is redundant).

  2. 2
    Step 2

    For each pair (Pᵢ requests Rₖ, Rₖ assigned to Pⱼ), draw Pᵢ → Pⱼ. Eliminate all resource nodes. The WFG now has only process nodes and directed edges.

  3. 3
    Step 3

    P₁ requests R₁ (assigned to P₂) → P₁ → P₂. P₂ requests R₂ (assigned to P₃) → P₂ → P₃. P₃ requests R₃ (assigned to P₄) → P₃ → P₄. P₄ requests R₄ (assigned to P₁) → P₄ → P₁. So edges: P₁ → P₂, P₂ → P₃, P₃ → P₄, P₄ → P₁.

  4. 4
    Step 4

    Start DFS from P₁: visit P₁ (gray), then P₂ (gray), then P₃ (gray), then P₄ (gray), then P₄ → P₁ — but P₁ is already gray! Cycle detected. A cycle in a single-instance WFG means deadlock. Processes P₁, P₂, P₃, P₄ are all deadlocked.

  5. 5
    Step 5

    Now consider a different scenario: P₁ → P₂, P₂ → P₃, P₃ → P₄ (no P₄ → P₁). DFS from P₁: P₁ → P₂ → P₃ → P₄. No gray node encountered. No cycle. No deadlock.

  6. 6
    Step 6

    Remember: The WFG method works ONLY for single-instance resources. For multi-instance resources, you must use the Banker's Detection Algorithm (matrix reduction method) described above.

Detection Frequency: When to Run Detection

Running deadlock detection is not free — it consumes CPU time and can degrade system performance. The frequency of detection involves a trade-off.

ApproachDescriptionProsCons
PeriodicRun detection at fixed time intervals (e.g., every 30 seconds, every hour).Simple to implement; predictable overhead.Deadlock may go undetected for the entire interval; resources tied up longer.
On-demand (event-driven)Run detection only when a resource request fails (i.e., a process is blocked waiting for a resource).Deadlocks are detected immediately when they occur.Overhead spikes under contention; may run detection many times unnecessarily.
HybridRun detection periodically, but also trigger it if too many processes become blocked within a short window.Balances overhead and responsiveness.More complex to implement.

Guidelines:

  • If deadlocks are rare → run detection infrequently (e.g., every hour or on admin request).
  • If deadlocks are frequent → run detection more often (e.g., every few seconds or on each block).
  • The cost of detection must be weighed against the cost of the deadlock (tied-up resources, frustrated users).

Deadlock Recovery: What to Do After Detection

Once a deadlock is detected, the system must recover — break the deadlock and allow processes to continue. There are two main approaches:


Option 1: Process Termination

StrategyDescriptionProsCons
Abort all deadlocked processesKill every process involved in the deadlock simultaneously.Simple, guaranteed to break the deadlock.Wasted computation — all processes lose their work. High restart cost.
Abort one process at a timeKill one process, check if deadlock is resolved; repeat if not.Minimizes number of processes killed.High overhead — re-run detection after each abort. Starvation possible if the same process is repeatedly selected.

Selection factors for choosing which process to abort:

  1. Priority of the process (lower priority → kill first).
  2. Runtime — how long the process has been running (more runtime → more work lost).
  3. Resources held — how many and what type of resources the process holds (more resources → more expensive to restart).
  4. Resources still needed — how many more resources the process needs to complete.
  5. Number of processes that will need to be terminated.

Option 2: Resource Preemption

StrategyDescriptionProsCons
Select a victimChoose a process to preempt (take resources from).No processes killed — work is not lost entirely.Rollback/restart required; overhead of saving state.
RollbackThe preempted process must be rolled back to a safe checkpoint and restarted from there.Minimizes reprocessing compared to full abort.Requires checkpointing infrastructure; may need to reacquire locks.
Starvation preventionEnsure the same process is not repeatedly chosen as the victim (e.g., include number of rollbacks in the selection cost factor).Fairness — all processes eventually make progress.Adds complexity to victim selection.

Comparison: Termination vs. Preemption

FactorProcess TerminationResource Preemption
Work lostEntire process execution up to the deadlock.Only execution since the last checkpoint.
ComplexitySimple — just kill the process(es).Complex — requires checkpoints, rollback, resource tracking.
OverheadRestart cost (re-create process, reload data).Checkpointing overhead (periodic state saving).
FairnessMay starve low-priority processes.Can include rollback count in selection to ensure fairness.
When to useWhen processes are short-lived or work can be easily re-created.When processes are long-running and restarting is expensive (e.g., database transactions).

Knowledge Check

Question 1 of 4
Q1Single choice

What is a Wait-for Graph (WFG) used for?