Process Concept, States, and PCB
An in-depth, rigorous exploration of the process lifecycle, process memory layout, the structure of the Process Control Block, and the low-level mechanics of context switching.
Learning Goals
- Deconstruct the memory layout of a running process (Text, Data, Heap, Stack).
- Analyze the 7-state process model, including suspended states and swapping mechanisms.
- Detail the internal data structures of a Process Control Block (PCB).
- Understand the low-level hardware and software mechanics of a context switch and its associated overhead.
The Process: Beyond the Program
A fundamental concept in operating systems is the distinction between a program and a process.
- Program: A passive entity. It is a file containing a list of instructions stored on disk (an executable file).
- Process: An active entity. It is a program in execution. A program becomes a process when its executable file is loaded into memory.
Two programs can be associated with the same program (e.g., two users running their own instance of a web browser). They are separate execution sequences and thus separate processes.
Process Memory Layout
When a process is loaded into RAM, the OS allocates a specific memory space for it. This address space is logically divided into several segments:
- Text Section: The compiled machine code. It is usually marked as read-only to prevent a program from accidentally modifying its instructions.
- Data Section: Contains global and static variables that are initialized by the programmer.
- BSS (Block Started by Symbol): Contains uninitialized global and static variables, which are zeroed out by the OS before the process starts.
- Heap: Memory dynamically allocated at run time (e.g., using
malloc()in C ornewin Java). The heap grows upwards. - Stack: Used for local variables, function parameters, return addresses, and control flow. Each time a function is called, a new "stack frame" is pushed. The stack grows downwards.
Note: The Stack and Heap grow towards each other. The OS must ensure they do not overlap, which would cause a stack overflow or heap corruption.
Process Creation: The fork() System Call
In systems like UNIX, a process is created using the fork() system call. The creator is the Parent, and the new process is the Child.
The fork() Logic:
fork()creates a new process by duplicating the calling process.- The child process is an exact copy of the parent process, but it has a unique PID.
- Return Values:
- In the parent process,
fork()returns the PID of the child. - In the child process,
fork()returns0. - If creation fails, it returns a negative value.
- In the parent process,
The 2^n Rule:
A very common exam question asks how many processes are created by a loop of fork() calls.
If fork() is called times (e.g., in a loop), the total number of processes generated is . The number of newly created (child) processes is .
Code Example: Parent-Child Relationship
1#include <stdio.h> 2#include <unistd.h> 3 4int main() { 5 pid_t pid; 6 pid = fork(); 7 8 if (pid < 0) { // Error 9 fprintf(stderr, "Fork Failed"); 10 return 1; 11 } else if (pid == 0) { // Child Process 12 printf("I am the child. My PID is %d\n", getpid()); 13 } else { // Parent Process 14 printf("I am the parent. My child's PID is %d\n", pid); 15 wait(NULL); // Parent waits for child to finish 16 printf("Child Complete\n"); 17 } 18 return 0; 19}
- Zombie Process: A child process that has terminated, but its parent has not yet called
wait()to read its exit status. The child's PCB remains in the OS table. - Orphan Process: A child process whose parent has terminated or crashed. The OS (usually the
initprocess) "adopts" the orphan to ensure it is properly cleaned up.
Process State Models
As a process executes, it continuously changes state depending on its current activity. Two models are commonly studied: the classic 5-state model (used in introductory texts and frequently tested in exams) and the 7-state model (which adds swapping for modern memory management).
The 5-State Model (Exam Favorite)
This is the model most frequently asked in exams [2023 Q3a, 2022 Q5a]. The five states are:
- New: The process is being created.
- Ready: The process is in main memory, waiting to be assigned to the CPU.
- Running: Instructions are currently being executed.
- Waiting (Blocked): The process is waiting for some event to occur (e.g., I/O completion, signal).
- Terminated: The process has finished execution.
Key Transitions to Remember:
- Ready → Running: The scheduler selects this process (Dispatch).
- Running → Ready: The process's time quantum expired (preemption), or another process with higher priority became ready.
- Running → Waiting: The process requests I/O or a resource that is not immediately available.
- Waiting → Ready: The I/O operation completes, or the resource becomes available.
- New → Ready: The process is admitted into the ready queue after creation.
The 7-State Model (Adding Suspension)
Modern OS resource management requires a 7-state model to handle situations where Main Memory (RAM) becomes full, forcing the OS to swap processes to the disk.
The 7 States:
- New: The process is being created and admitted to the system.
- Ready: The process is in memory, waiting to be assigned to a CPU.
- Running: Instructions are currently being executed by the CPU.
- Waiting / Blocked: The process cannot execute until some event occurs (e.g., I/O completion, a lock becomes available).
- Terminated: The process has finished execution and its resources are being deallocated.
- Suspended Ready: When main memory is full, the OS moves a 'Ready' process to secondary storage (disk) to free up RAM.
- Suspended Blocked: A 'Waiting' process is moved to disk. If the event it was waiting for occurs while on disk, it transitions to Suspended Ready.
Deconstructing the Process Control Block (PCB)
To manage processes, the OS requires detailed metadata. Every process is represented by a Process Control Block (PCB) (e.g., represented as struct task_struct in the Linux kernel). The PCB is a massive data structure residing in kernel memory.
Critical Fields of a PCB:
- Process Identification:
- PID (Process ID): A unique integer identifying the process.
- PPID (Parent Process ID): Points to the creator of the process.
- User/Group ID: Identifies the user who owns the process for security checks.
- CPU State (Context):
- Program Counter (PC): The memory address of the next instruction to be executed when the process resumes.
- CPU Registers: Copies of the accumulators, index registers, stack pointers, and condition-code information. This must be saved flawlessly during an interrupt.
- CPU Scheduling Information:
- Priority: Defines the process's importance relative to others.
- Pointers to Scheduling Queues: Links connecting the PCB to the Ready Queue or specific I/O Wait Queues.
- Memory-Management Information:
- Pointers to the page tables or segment tables that map the process's logical address space to physical RAM.
- Values of the base and limit registers.
- Accounting Information:
- Amount of CPU time consumed, time limits, execution time.
- I/O Status Information:
- A list of I/O devices allocated to the process.
- A list of open file descriptors.
The Mechanics and Cost of Context Switching
A Context Switch is the procedure by which the CPU stops executing one process and begins executing another. It is the core mechanism that allows multiprogramming and time-sharing.
Step-by-Step Execution of a Context Switch:
- Trigger: A hardware interrupt occurs (e.g., the timer goes off), or the running process makes a blocking system call (e.g., requesting disk I/O).
- Mode Switch: The CPU switches from User Mode to Kernel Mode.
- Save State: The OS saves the exact state of the CPU (all registers, program counter, flags) into the PCB of the currently running process ().
- Update PCB: The OS updates the state of (e.g., from Running to Ready or Blocked) and moves its PCB to the appropriate queue.
- Invoke Scheduler: The CPU Scheduler algorithm is executed to select the next process to run ().
- Restore State: The OS loads the CPU registers and Program Counter with the saved values from the PCB of .
- Resume: The CPU switches back to User Mode and resumes execution of from the exact point it was previously stopped.
The Cost (Overhead)
Context switching is pure overhead; the system does no useful computational work while switching.
- Direct Costs: Saving and restoring registers, executing the scheduler.
- Indirect Costs (Cache Degradation): When a new process is loaded, the data currently in the CPU Cache (L1/L2) and the Translation Lookaside Buffer (TLB) likely belongs to the old process. The TLB must be flushed, and the cache will experience many "misses" until the new process warms it up.
Hardware support, such as processors with multiple sets of registers (e.g., Sun UltraSPARC), allows the OS to simply change a pointer to the current register set, vastly reducing context switch time.
Knowledge Check
During a context switch, why is the TLB (Translation Lookaside Buffer) usually flushed?