Virtual Memory: Demand Paging, Locality, and Working Sets
Understand how virtual memory creates the illusion of infinite RAM through demand paging, master the complete page fault handling mechanism, and learn why locality of reference is the phenomenon that makes virtual memory practical.
Learning Goals
- Explain how demand paging allows a process's logical address space to exceed physical RAM.
- Trace the complete 8-step page fault handling sequence from trap to instruction restart.
- Calculate Effective Access Time (EAT) given a page fault rate and service time, and interpret the result.
- Define temporal and spatial locality and explain the Working Set model.
- Identify the conditions that cause thrashing and describe the OS strategy to prevent it.
The Virtual Memory Illusion
Section 2 introduced paging: a process's pages can be scattered across any available physical frames. But paging still assumed that all of a process's pages must be in RAM simultaneously. Virtual memory removes even this constraint.
The key observation: At any point during execution, a process only actively uses a small fraction of its total address space. Functions that handle rare error conditions, initialization code that runs once and never again, large arrays where only a few elements are accessed at a time — all of this takes up space in the logical address space but sits idle most of the time.
Virtual memory's promise: Keep only the actively needed pages in RAM. Store the rest on disk. Move pages in and out of RAM on demand. The process never knows this is happening — it still sees a continuous, full logical address space.
Demand Paging: The Lazy Loading Strategy
Demand paging is the most widely used virtual memory scheme. Its principle is simple:
Never load a page into RAM until a process actually references it.
When a process starts, none of its pages need to be in RAM. The OS only loads a page when the process generates a memory access to that page. This is sometimes called lazy loading (as opposed to pre-paging, where the OS tries to predict and pre-load pages it thinks will be needed soon).
The Mechanism: The Valid/Invalid Bit (Revisited)
Recall from Section 2 that each page table entry has a valid/invalid bit:
- Valid (1): The page is in RAM. The frame number in the entry is correct and usable.
- Invalid (0): The page is either on disk (swapped out) or is not part of the process's legal address space.
With demand paging, when a process starts, all page table entries are initialized to invalid. Pages are loaded on demand, and their bits are flipped to valid one by one as they are needed.
Benefits of Demand Paging
- Larger logical address spaces: A process can have a 4GB logical address space on a machine with only 512MB of RAM.
- Higher degree of multiprogramming: Since each process only uses a fraction of RAM, more processes can be in memory simultaneously, improving CPU utilization.
- Faster program startup: A program begins executing immediately (with no pages loaded) rather than waiting for its entire image to be copied to RAM.
- Efficient RAM usage: RAM is only consumed by pages that are actually needed, not by code that is never reached.
The Page Fault: When a Required Page is Not in RAM
When the CPU generates a logical address and the MMU checks the page table entry for that page — and finds the valid/invalid bit = 0 (page not in RAM) — the hardware triggers a page fault trap. Control transfers immediately to the OS's page fault handler.
The Complete Page Fault Handling Sequence
The key insight — restarting the instruction: The faulting instruction is restarted from scratch (not resumed mid-execution). This is possible because the OS saved the process's exact state (PC, registers) before handling the fault. When the page is now in RAM, the instruction executes successfully without the process ever knowing a disk access occurred.
Performance of Demand Paging: The EAT Formula
Every page fault involves a disk I/O operation, which is millions of times slower than RAM access. Even rare page faults drastically degrade performance.
Effective Access Time with Page Faults:
Where:
- = page fault rate (probability that a given memory access causes a fault; )
- = memory access time (without a fault; e.g., 200 ns)
- Page fault service time = time to handle the fault, load the page from disk, and restart = typically 8–20 milliseconds (dominated by disk seek + rotational latency)
Worked Example:
Given: ns, page fault service time = ms = ns, (1 fault per 1000 accesses)
Without any page faults: EAT = 200 ns. With just 1 fault per 1000 accesses: EAT ≈ 8,200 ns — a 41× slowdown.
| Page Fault Rate | EAT | Slowdown Factor |
|---|---|---|
| 0 (no faults) | 200 ns | 1× |
| 1 / 10,000 | 1,000 ns | 5× |
| 1 / 1,000 | 8,200 ns | 41× |
| 1 / 100 | 80,200 ns | 401× |
Conclusion: To keep demand paging practical, the page fault rate must be kept extremely small (typically to limit slowdown to under 10%). This is achieved by locality of reference — explained next.
Handling a Page Fault: Complete Trace from Trap to Instruction Restart
- 1Step 1
Process P is executing normally. Its current instruction is
MOV R1, [3200]— load the value at logical address 3200. With page size = 1024B, this maps to page number 3 (3200 ÷ 1024 = 3), offset 128. The CPU sends page number 3 to the MMU. - 2Step 2
The MMU looks up page 3 in process P's page table. The entry shows: valid/invalid bit = 0 (invalid). Page 3 is NOT currently in RAM — it was never loaded (demand paging) or was swapped out to disk. The MMU immediately raises a PAGE FAULT TRAP interrupt.
- 3Step 3
The hardware automatically saves the process's complete state into its PCB: the Program Counter (pointing to the MOV instruction that caused the fault), all CPU registers, and the stack pointer. The CPU switches to Kernel Mode. Process P is now in the Blocked state. The OS page fault handler begins executing.
- 4Step 4
The OS checks its internal memory map for process P (not the page table, but a separate virtual address space descriptor). Is logical address 3200 a legal part of P's address space? YES — it falls within P's declared stack region. The fault was not a programming error. The OS proceeds to retrieve the page.
- 5Step 5
The OS checks its free frame list. It finds that frame 7 is available. If no free frames were available, the OS would run a page replacement algorithm (FIFO, LRU, etc. — covered in Section 4) to evict an existing page to disk. For this trace, frame 7 is free and selected.
- 6Step 6
The OS issues a disk read request: load the contents of page 3 of process P from the swap partition on disk into physical frame 7 in RAM. Process P remains Blocked while waiting for I/O. Other processes run on the CPU during this time (the disk I/O takes ~8ms — millions of CPU cycles). When the disk signals completion (via interrupt), the OS resumes.
- 7Step 7
The OS updates process P's page table entry for page 3: frame number = 7, valid/invalid bit = 1 (valid). If the page was loaded into the TLB during a previous access, the TLB entry must also be invalidated (or updated). Process P is moved from the Blocked queue back to the Ready queue.
- 8Step 8
When the OS scheduler dispatches process P again, it restores P's saved CPU state from the PCB. The Program Counter points back to the faulting MOV instruction. The instruction executes again from the start: the MMU now finds page 3 in the page table (valid bit = 1, frame = 7), translates the address to frame 7, offset 128 → physical address 7×1024 + 128 = 7,296. The data is read from RAM and loaded into R1. ✅ The process continues without any awareness of what happened.
Locality of Reference: Why Demand Paging Works in Practice
The EAT formula shows that even rare page faults are catastrophic. Yet demand paging works extremely well in practice, with CPUs spending very little time on page fault handling. Why?
The answer is Locality of Reference — a fundamental property of almost all programs.
Temporal Locality
A resource that was recently used is very likely to be used again in the near future.
- A loop iterates over the same instructions hundreds of times → same pages accessed repeatedly.
- A recently-called function's local variables are likely accessed again → same stack page accessed repeatedly.
- Implication: Once a page is loaded, it tends to be accessed many times before being needed from disk again.
Spatial Locality
If a resource at address X is accessed, resources at addresses near X are likely to be accessed soon.
- Sequential instruction execution: after instruction at address A, instruction at A+4 is next → same or adjacent page.
- Array traversal: after accessing
arr[0],arr[1]is next → same page (for most array sizes). - Implication: Loading a full page (4KB) at once captures dozens of nearby variables and instructions that will be needed shortly, amortizing the disk I/O cost.
Together, these two properties mean: The "active" set of pages a program needs at any point in time is small and changes slowly. This keeps the page fault rate extremely low.
The Working Set Model
The Working Set is the formal model that captures locality of reference. It is used by OS designers to decide how many frames each process needs.
Definition: The Working Set of process at time with window is:
- (the working set window) is typically measured in page references (e.g., references).
- = the working set size = number of distinct pages in the working set.
Intuition: The working set captures the pages a process is "actively using" right now. Pages that were used long ago (more than references back) fall out of the working set.
Thrashing: When Working Sets Exceed Available RAM
Let = the total memory demand across all processes:
And let = total number of available physical frames.
If : The OS cannot keep every process's working set in RAM simultaneously. Processes constantly page fault. Each fault causes a disk I/O. Processes are constantly suspended, waiting for pages to be loaded. The CPU becomes idle — not because processes are done, but because they're all waiting for disk I/O.
This vicious cycle is called thrashing. CPU utilization collapses despite the system being "busy."
The CPU Utilization vs. Degree of Multiprogramming Curve:
| Degree of Multiprogramming | CPU Utilization |
|---|---|
| Low (few processes) | Low (CPU often idle) |
| Optimal (working sets fit in RAM) | High ✅ |
| Too high (D > m) | Collapses — Thrashing ❌ |
The Working Set Strategy: Preventing Thrashing
The OS uses the working set model to control the degree of multiprogramming:
- Monitor the working set size of each process.
- Calculate total demand .
- If : Suspend (swap out) one or more processes to free frames until . The suspended process's pages are written to disk. When a frame becomes available, resume a suspended process.
- If significantly: Admit a new process to improve CPU utilization.
This ensures every process in memory has its working set fully resident in RAM, keeping page fault rates low and CPU utilization high.
Exam note: The working set strategy is essentially a dynamic form of process scheduling at the memory level. The OS is deciding not just who gets the CPU, but who gets to stay in RAM.
Knowledge Check
A system has a memory access time of 150 ns. Disk page fault service time is 10 ms. If the page fault rate is 1 in every 5,000 accesses, what is the Effective Access Time (EAT)?