I/O Systems and Controllers

60 mins

Learning Goals

Explain the role of device controllers and how they offload low-level hardware management from the CPU.
Differentiate between programmed I/O, interrupt-driven I/O, and Direct Memory Access (DMA).
Trace the full lifecycle of an I/O request from user application to device hardware and back.
Understand the layered architecture of device drivers in the Linux/Windows kernel.

I/O Hardware: The CPU's View of Devices

The CPU does not interact directly with devices like keyboards, disks, or printers. Instead, each device is managed by a Device Controller — a specialized electronic circuit (on the motherboard or the device itself) that handles low-level hardware details.

Device Controllers

A device controller is responsible for:

Converting the byte stream from the OS into the electrical signals required by the device.
Managing the device's internal buffer (e.g., the 512-byte sector buffer on a hard drive).
Reporting status (busy, ready, error) via status registers.
Accepting commands (read, write, format) via command registers.

Component	Purpose
Data Register	Holds data being transferred to/from the device (e.g., a byte for a serial port)
Status Register	Contains flags: busy, ready, error, interrupt-enabled
Command Register	Tells the controller what operation to perform (e.g., READ_SECTOR, WRITE_SECTOR)

Memory-Mapped I/O vs Port-Mapped I/O

Feature	Memory-Mapped I/O (MMIO)	Port-Mapped I/O (PMIO)
How it works	Device registers are mapped into the CPU's memory address space	Device registers accessed via special I/O instructions (IN, OUT in x86)
Advantage	No special instructions needed — use regular memory read/write	Does not consume memory address space
Used by	Modern systems (ARM, x86 memory-mapped PCI config space)	Legacy x86 devices, embedded systems
Example	Reading from address 0xFEC00000 returns the local APIC status	`IN AL, 0x60` reads keyboard scancode

Three I/O Techniques: From Polling to DMA

Historically, three techniques have been used to perform I/O. Each trades simplicity for efficiency.

1. Programmed I/O (Polling)

The CPU does all the work — it continuously checks the device's status register to see if it is ready.

1// Example: Reading a byte from a serial port via polling
2while ((inb(STATUS_PORT) & BUSY) != 0) {
3    // Spin — wait for device to become ready
4}
5byte data = inb(DATA_PORT);   // Read the data byte

Pros	Cons
Simple to implement	CPU is wasted — it spins in a loop instead of doing useful work
No special hardware needed	Polling frequency must be high enough to avoid missing data
Works for very simple/slow devices	Cannot handle high-speed devices like disks efficiently

2. Interrupt-Driven I/O

The CPU issues a command and then continues executing other processes. The device sends an interrupt when the operation is complete.

1// Example: Reading a sector from disk via interrupts
2// Step 1: Issue the command
3outb(COMMAND_PORT, READ_SECTOR);
4outb(LBA_PORT, sector_number);   // Specify which sector
5
6// Step 2: CPU is free to do other work!
7// ... process scheduling, computation, etc. ...
8
9// Step 3: When disk finishes, it raises an interrupt.
10// The Interrupt Handler executes:
11void disk_interrupt_handler() {
12    byte data[512];
13    for (int i = 0; i < 512; i++) {
14        data[i] = inb(DATA_PORT);
15    }
16    // Signal to the waiting process that data is ready
17}

Pros	Cons
CPU is not wasted — processes execute during I/O	Interrupt overhead — saving/restoring context for every I/O operation
Good for slow/medium devices (keyboard, network)	High-speed devices (disk, SSD) generate thousands of interrupts per second

3. Direct Memory Access (DMA)

For high-speed devices, even interrupt-driven I/O is too slow because the CPU must still copy data between the device buffer and memory byte-by-byte. DMA solves this by letting the device controller transfer data directly to/from RAM without CPU involvement.

How DMA Works (Step by Step):

The CPU programs the DMA controller with: source address (device buffer), destination address (RAM buffer), and transfer size.
The CPU is free to execute other processes while the DMA controller manages the transfer.
The DMA controller takes control of the system bus and transfers data word-by-word from device to RAM (or RAM to device).
When the full transfer is done, the DMA controller raises a single interrupt to signal completion.

Comparison of I/O Techniques:

Aspect	Programmed I/O	Interrupt-Driven I/O	DMA
Data path	CPU → Device register	Device → CPU register → RAM	Device → RAM (direct)
CPU utilization	Very low (busy-waiting)	Moderate (interrupt per transfer)	High (one interrupt per block)
Best for	Simple/slow devices (PS/2 keyboard)	Medium-speed devices (network card)	High-speed devices (disk, SSD, GPU)
Overhead	CPU spins continuously	Context switch per interrupt	DMA controller setup + one interrupt

Direct Memory Access (DMA) — How it Works

Interrupt Handlers: The Kernel's Urgent Response

An interrupt is a hardware signal that tells the CPU that an event requiring immediate attention has occurred. When a device raises an interrupt, the CPU must stop its current execution, handle the device's needs, and then resume.

Interrupt Vector Table (IVT) / Interrupt Descriptor Table (IDT)

Every interrupt type is assigned a unique number (the interrupt vector). The CPU uses this number as an index into a table of handler addresses:

x86 Real Mode: IVT at address 0x0000:0000 (256 entries × 4 bytes = 1024 bytes).
x86 Protected Mode: IDT — up to 256 entries, each 8 bytes, location specified by the IDTR register.

Vector Range	Type	Examples
0–31	Exceptions (internal CPU events)	Divide by zero (0), Page fault (14), General protection fault (13)
32–255	Interrupts (external device events)	Timer (IRQ0), Keyboard (IRQ1), Disk (IRQ14)

Maskable vs Non-Maskable Interrupts

Type	Can the CPU ignore it?	Example
Maskable	Yes — if interrupts are disabled (`cli` instruction)	Timer interrupt, disk interrupt
Non-Maskable (NMI)	No — always handled immediately	Hardware failure, memory ECC error, watchdog timer

Top Half vs Bottom Half

Modern OS kernels split interrupt handling into two parts to minimize the time interrupts are disabled:

Part	Description	Runs with interrupts?
Top Half (Hardware Interrupt Handler)	Acknowledges the interrupt, saves minimal data, schedules the bottom half. Must be extremely fast.	Interrupts disabled — runs immediately
Bottom Half (Softirq / Tasklet / Work Queue)	Performs the heavy processing (e.g., copying data to user buffer, waking up waiting processes). Can be deferred and re-enabled.	Interrupts re-enabled — runs later

Device Drivers: The Kernel's Device Abstraction

A device driver is a kernel module that understands the specific protocol of a device controller and presents a uniform interface to the rest of the OS.

The Layered I/O Architecture

Key Insight: The VFS allows user applications to issue read() and write() calls without knowing whether the underlying device is a hard disk, SSD, USB drive, or network file system. The driver handles the conversion.

Driver Characteristics

Aspect	User-Space Driver	Kernel-Space Driver
Location	Runs in user mode as a separate process	Runs in kernel mode, loaded into kernel space
Example	FUSE (Filesystem in Userspace) — e.g., sshfs	Linux SCSI/NVMe drivers, Windows WDDM
Advantage	Crash doesn't bring down the OS	Maximum performance, direct hardware access
Disadvantage	Slower — context switches required per call	Bug can crash the entire system

The Lifecycle of a read() System Call — From User Space to Disk

1
Step 1
A user-space process calls read(fd, buffer, 512) where fd is an open file descriptor. The C library translates this into a system call (e.g., sys_read on Linux via int 0x80 or syscall instruction). The CPU switches from User Mode (Ring 3) to Kernel Mode (Ring 0).
2
Step 2
The kernel's VFS layer receives the request. It checks the file descriptor to find which file system (ext4, NTFS, etc.) the file belongs to. It also checks permissions and determines the logical block number(s) on the disk where the file's data is stored.
3
Step 3
The file system (e.g., ext4) looks up the file's inode to find which physical disk blocks hold the data. If the data is not in the page cache, the file system issues a block I/O request to the generic block layer.
4
Step 4
The block I/O layer receives the request and places it in the I/O scheduler queue (e.g., CFQ, Deadline, NOOP). The scheduler may reorder requests to optimize disk head movement (merging adjacent blocks, sorting by sector number).
5
Step 5
The block layer passes the request to the device driver (e.g., ahci for SATA, nvme for NVMe). The driver programs the DMA controller: source = disk sector, destination = kernel buffer, size = 512 bytes. It writes the command to the device controller's command register and returns.
6
Step 6
The disk controller reads the sector from the spinning platter (or NAND flash), transfers the 512 bytes via DMA directly into the kernel buffer in RAM. When complete, the controller raises an interrupt. The interrupt handler marks the I/O as complete, the waiting process is woken up, and the data is copied from the kernel buffer to the user-space buffer. The read() call returns.

SPOOLING stands for Simultaneous Peripheral Operation Online. It is a technique that makes an exclusive (non-sharable) device appear sharable by buffering its output to a high-speed storage device (usually disk).

Aspect	Detail
Full Form	Simultaneous Peripheral Operation Online
Mechanism	Instead of each process directly writing to a slow device (like a printer), processes write to a high-speed disk buffer (the spool). A dedicated daemon (e.g., `lpd` for line printer) reads from the spool and writes to the physical device.
Analogy	A restaurant order counter: customers (processes) place orders (output) at a counter (the spool). The chef (daemon) processes orders one at a time from the counter.
Benefit	Multiple processes can "print" almost instantly — the slow physical device is never the bottleneck. No process waits for the printer.
Relation to Deadlocks	Spooling breaks the Mutual Exclusion condition for deadlocks (as discussed in Module 4).

RAID: Redundant Array of Independent Disks [2024 Q1c]

RAID stands for Redundant Array of Independent Disks. It is a technique that combines multiple physical disk drives into a single logical unit to improve performance and/or reliability.

Level	Description	Min Drives	Performance	Reliability
RAID 0	Striping — data split across disks	2	Excellent read/write	None — any drive fails, all data lost
RAID 1	Mirroring — exact copy on two disks	2	Good read, slower write	Excellent — one drive can fail
RAID 5	Striping + distributed parity	3	Good read, moderate write	Good — one drive can fail; parity rebuilds data
RAID 6	Striping + dual distributed parity	4	Good read, slower write	Very good — two drives can fail
RAID 10	Mirroring + striping (combination)	4	Excellent	Excellent — multiple failures tolerated

Key Insight: RAID 0 provides performance (parallel I/O) but zero redundancy. RAID 1 provides redundancy by duplication. RAID 5/6 provide redundancy with less space overhead by using parity calculations.

CLV vs CAV: Disk Rotation Strategies [2023 Q8a]

Feature	Constant Angular Velocity (CAV)	Constant Linear Velocity (CLV)
Rotation Speed	Disk rotates at constant RPM regardless of which track is being read	Disk rotation varies — slower for outer tracks, faster for inner tracks
Data Density	Outer tracks have more sectors than inner tracks (same angle, more circumference)	All tracks have the same number of sectors — density is higher on inner tracks
Data Transfer Rate	Higher on outer tracks (more data passes under the head per rotation)	Constant across all tracks — transfer rate is uniform
Used By	Hard disk drives (HDDs) — modern HDDs use zone-bit recording (ZBR), a hybrid	Optical drives (CD/DVD), old floppy disks
Seek Complexity	Simple — just position the head	Complex — must also adjust rotation speed

I/O Buffering and Caching

To improve I/O performance, the OS uses multiple levels of buffering and caching.

Technique	Description	Benefit
Page Cache	Caches file data blocks in RAM	Subsequent reads hit RAM instead of disk
Buffer Cache	Caches disk blocks in RAM (separate from page cache in older kernels)	Reduces physical disk I/O
Double Buffering	Two buffers: one being filled by device, one being read by the process	Prevents the process from waiting when it's ready for the next block
Spooling	Queue of output jobs for a shared device (printer)	Allows multiple processes to 'print' without waiting — the spooler daemon manages the physical device

Knowledge Check

Question 1 of 4

Q1Single choice

Which I/O technique allows the CPU to execute other processes while a data transfer between device and memory takes place without CPU involvement?

Programmed I/O (Polling)

Interrupt-Driven I/O

Direct Memory Access (DMA)

Memory-Mapped I/O

DMA Explained — GeeksforGeeks

article

PYQ Analysis & Exam Preparation — Module 5

File Management and Allocation Methods