Detection and Recovery

45 mins

Learning Goals

Construct a Wait-for Graph from a Resource-Allocation Graph for single-instance resource systems.
Apply the Banker's Detection Algorithm to detect deadlocks in multi-instance resource systems.
Analyze the trade-offs of different detection frequencies (periodic vs. on-demand).
Evaluate recovery strategies: process termination (abort all vs. abort one-by-one) and resource preemption (victim selection, rollback, starvation prevention).

Deadlock Detection: An Overview

Deadlock detection is the process of determining whether a deadlock currently exists in the system. Unlike prevention and avoidance, detection allows the system to enter deadlock states — but then detects them and takes corrective action.

When to use detection:

Prevention is too costly or restrictive.
Avoidance is impossible because maximum resource needs are unknown.
The system is a general-purpose OS (like Linux or Windows) with unpredictable workloads.

Detection methods depend on resource type:

Resource Type	Detection Method	Complexity
Single-instance (each resource type has exactly 1 instance)	Wait-for Graph (WFG) — check for cycles	O(n²) using DFS
Multi-instance (each resource type can have multiple instances)	Banker's Detection Algorithm — matrix reduction	O(m × n²)

Detection for Single-Instance Resources: Wait-for Graph

For systems where each resource type has exactly one instance, a deadlock can be detected by constructing a Wait-for Graph (WFG) from the Resource-Allocation Graph (RAG).

Transforming RAG to WFG

The WFG is derived from the RAG by removing resource nodes and connecting processes directly:

For each pair (Pᵢ, Rₖ, Pⱼ) where Pᵢ is waiting for Rₖ (request edge Pᵢ → Rₖ) and Rₖ is allocated to Pⱼ (assignment edge Rₖ → Pⱼ), draw a wait-for edge from Pᵢ → Pⱼ.
Remove all resource nodes from the graph.
The resulting graph has only process nodes and directed edges representing "is waiting for".

Before (RAG): The Resource-Allocation Graph shows processes connected through resources.

After (WFG): The Wait-for Graph removes resource nodes and connects processes directly.

The cycle P₁ → P₂ → P₁ in the WFG indicates a deadlock.

Cycle Detection

Graph property	Meaning
WFG has no cycles	No deadlock exists.
WFG contains one or more cycles	Deadlock exists. Every process in the cycle is deadlocked.

Example Transformation:

RAG Edges	WFG Edge	Meaning
P₁ → R₂ (request) + R₂ → P₂ (assignment)	P₁ → P₂	P₁ is waiting for P₂
P₂ → R₁ (request) + R₁ → P₁ (assignment)	P₂ → P₁	P₂ is waiting for P₁
Result: Cycle P₁ → P₂ → P₁ — Deadlock detected.

Detection Algorithm (DFS-Based)

For each process Pᵢ in the WFG, start a Depth-First Search (DFS).
Mark nodes as: White (unvisited), Gray (in current DFS path), Black (fully explored).
If during DFS from Pᵢ we encounter a node that is already Gray, a cycle (deadlock) has been detected.
If no Gray node is encountered during any DFS, no deadlock exists.

Deadlock Detection and Recovery — Animated Explanation

Detection for Multi-Instance Resources: Banker's Detection Algorithm

When resource types have multiple instances, the Wait-for Graph approach is insufficient (a cycle does not guarantee deadlock). Instead, we use the Banker's Detection Algorithm, which is similar to the Safety Algorithm but uses current request information rather than maximum need.

Data Structures

Structure	Description
Available[m]	Number of available instances of each resource type.
Allocation[n][m]	Number of instances currently allocated to each process.
Request[n][m]	Current outstanding request of each process (what each process is currently waiting for).

The Detection Algorithm

1. Let Work[m] = Available[m]   // Temporary copy
2. Let Finish[n] = false for all processes
3. For each process Pᵢ where Finish[i] == false:
      if Request[i] ≤ Work, then:
         Work = Work + Allocation[i]   // Pretend Pᵢ finishes and releases
         Finish[i] = true
         Go to step 3 and restart
4. After scanning all processes:
      if Finish[i] == false for any process, then Pᵢ is deadlocked.

Result Interpretation

Condition	Meaning
All Finish[i] == true	No deadlock exists. The system can recover.
Some Finish[i] == false	Those processes are deadlocked. They will not be able to complete.

Example: 3 Processes, 2 Resources

Given:

Available = (0, 0)
Allocation = P₀: (0, 1), P₁: (2, 0), P₂: (3, 0)
Request = P₀: (0, 0), P₁: (2, 0), P₂: (0, 1)

Execution:

Work = (0, 0). P₀: Request (0,0) ≤ (0,0) ✓ → P₀ finishes → Work = (0+0, 0+1) = (0, 1). Finish[0] = true.
P₁: Request (2,0) ≤ (0,1) ✗ → P₁ blocked.
P₂: Request (0,1) ≤ (0,1) ✓ → P₂ finishes → Work = (0+3, 1+0) = (3, 1). Finish[2] = true.
P₁: Request (2,0) ≤ (3,1) ✓ → P₁ finishes → Work = (3+2, 1+0) = (5, 1). Finish[1] = true.
All Finish[i] == true → No deadlock. Sequence: <P₀, P₂, P₁>.

PYQ Variation: Detection Using Allocation + Request Only (No Max) [2022 Q4b]

Some exam problems provide Allocation and Request matrices directly without a Max matrix. In this case, you use the same detection algorithm but treat Request as the current outstanding needs. Let's solve the 2022 PYQ.

Problem [2022 Q4b]: Single processor system, 3 resource types (X, Y, Z) with 5 units each, 3 processes. Find which process finishes first and last.

Process	Allocation (X Y Z)	Request (X Y Z)
P₀	1 2 1	1 0 3
P₁	2 0 1	0 1 2
P₂	2 2 1	1 2 0

Step 1: Calculate Available = Total − Sum(Allocation) = (5,5,5) − (1+2+2, 2+0+2, 1+1+1) = (5,5,5) − (5,4,3) = (0, 1, 2).

Step 2: Apply Detection Algorithm. Work = (0, 1, 2).

P₁: Request (0,1,2) ≤ Work (0,1,2) ✓ → P₁ finishes → Work = (0+2, 1+0, 2+1) = (2, 1, 3). P₁ finishes first.
P₀: Request (1,0,3) ≤ Work (2,1,3) ✓ → P₀ finishes → Work = (2+1, 1+2, 3+1) = (3, 3, 4).
P₂: Request (1,2,0) ≤ Work (3,3,4) ✓ → P₂ finishes → Work = (3+2, 3+2, 4+1) = (5, 5, 5). P₂ finishes last.

Result: All processes complete. Safe sequence: <P₁, P₀, P₂>. P₁ finishes first, P₂ finishes last.

Constructing a Wait-for Graph and Detecting Deadlocks

1
Step 1
Given a system with 4 processes (P₁, P₂, P₃, P₄) and 4 resource types (R₁, R₂, R₃, R₄) each with one instance. RAG edges: P₁ requests R₁, R₁ assigned to P₂. P₂ requests R₂, R₂ assigned to P₃. P₃ requests R₃, R₃ assigned to P₄. P₄ requests R₄, R₄ assigned to P₁. P₂ requests R₁, R₁ assigned to P₂ (self-loop — P₂ wants R₁ which it already has? Actually this is redundant).
2
Step 2
For each pair (Pᵢ requests Rₖ, Rₖ assigned to Pⱼ), draw Pᵢ → Pⱼ. Eliminate all resource nodes. The WFG now has only process nodes and directed edges.
3
Step 3
P₁ requests R₁ (assigned to P₂) → P₁ → P₂. P₂ requests R₂ (assigned to P₃) → P₂ → P₃. P₃ requests R₃ (assigned to P₄) → P₃ → P₄. P₄ requests R₄ (assigned to P₁) → P₄ → P₁. So edges: P₁ → P₂, P₂ → P₃, P₃ → P₄, P₄ → P₁.
4
Step 4
Start DFS from P₁: visit P₁ (gray), then P₂ (gray), then P₃ (gray), then P₄ (gray), then P₄ → P₁ — but P₁ is already gray! Cycle detected. A cycle in a single-instance WFG means deadlock. Processes P₁, P₂, P₃, P₄ are all deadlocked.
5
Step 5
Now consider a different scenario: P₁ → P₂, P₂ → P₃, P₃ → P₄ (no P₄ → P₁). DFS from P₁: P₁ → P₂ → P₃ → P₄. No gray node encountered. No cycle. No deadlock.
6
Step 6
Remember: The WFG method works ONLY for single-instance resources. For multi-instance resources, you must use the Banker's Detection Algorithm (matrix reduction method) described above.

Detection Frequency: When to Run Detection

Running deadlock detection is not free — it consumes CPU time and can degrade system performance. The frequency of detection involves a trade-off.

Approach	Description	Pros	Cons
Periodic	Run detection at fixed time intervals (e.g., every 30 seconds, every hour).	Simple to implement; predictable overhead.	Deadlock may go undetected for the entire interval; resources tied up longer.
On-demand (event-driven)	Run detection only when a resource request fails (i.e., a process is blocked waiting for a resource).	Deadlocks are detected immediately when they occur.	Overhead spikes under contention; may run detection many times unnecessarily.
Hybrid	Run detection periodically, but also trigger it if too many processes become blocked within a short window.	Balances overhead and responsiveness.	More complex to implement.

Guidelines:

If deadlocks are rare → run detection infrequently (e.g., every hour or on admin request).
If deadlocks are frequent → run detection more often (e.g., every few seconds or on each block).
The cost of detection must be weighed against the cost of the deadlock (tied-up resources, frustrated users).

Deadlock Recovery: What to Do After Detection

Once a deadlock is detected, the system must recover — break the deadlock and allow processes to continue. There are two main approaches:

Option 1: Process Termination

Strategy	Description	Pros	Cons
Abort all deadlocked processes	Kill every process involved in the deadlock simultaneously.	Simple, guaranteed to break the deadlock.	Wasted computation — all processes lose their work. High restart cost.
Abort one process at a time	Kill one process, check if deadlock is resolved; repeat if not.	Minimizes number of processes killed.	High overhead — re-run detection after each abort. Starvation possible if the same process is repeatedly selected.

Selection factors for choosing which process to abort:

Priority of the process (lower priority → kill first).
Runtime — how long the process has been running (more runtime → more work lost).
Resources held — how many and what type of resources the process holds (more resources → more expensive to restart).
Resources still needed — how many more resources the process needs to complete.
Number of processes that will need to be terminated.

Option 2: Resource Preemption

Strategy	Description	Pros	Cons
Select a victim	Choose a process to preempt (take resources from).	No processes killed — work is not lost entirely.	Rollback/restart required; overhead of saving state.
Rollback	The preempted process must be rolled back to a safe checkpoint and restarted from there.	Minimizes reprocessing compared to full abort.	Requires checkpointing infrastructure; may need to reacquire locks.
Starvation prevention	Ensure the same process is not repeatedly chosen as the victim (e.g., include number of rollbacks in the selection cost factor).	Fairness — all processes eventually make progress.	Adds complexity to victim selection.

Comparison: Termination vs. Preemption

Factor	Process Termination	Resource Preemption
Work lost	Entire process execution up to the deadlock.	Only execution since the last checkpoint.
Complexity	Simple — just kill the process(es).	Complex — requires checkpoints, rollback, resource tracking.
Overhead	Restart cost (re-create process, reload data).	Checkpointing overhead (periodic state saving).
Fairness	May starve low-priority processes.	Can include rollback count in selection to ensure fairness.
When to use	When processes are short-lived or work can be easily re-created.	When processes are long-running and restarting is expensive (e.g., database transactions).

Knowledge Check

Question 1 of 4

Q1Single choice

What is a Wait-for Graph (WFG) used for?

Detecting deadlocks in systems with multi-instance resources.

Detecting deadlocks in systems with single-instance resources.

Preventing deadlocks by breaking the circular wait condition.

Allocating resources using the Banker's Algorithm.

Deadlock Detection and Recovery in OS — GeeksforGeeks

article

Deadlock Prevention and Avoidance

PYQ Analysis and Exam Preparation