File Management and Allocation Methods
Learning Goals
- Classify file types (regular, directory, special) and distinguish between sequential, direct, and indexed access methods.
- Design and compare single-level, two-level, tree-structured, and acyclic-graph directory structures.
- Analyze the performance trade-offs of contiguous, linked, and indexed file allocation methods.
- Calculate disk space utilization and fragmentation for each allocation method.
File Concepts: What is a File?
A file is a named collection of related data that is treated as a unit by the operating system. Files are the primary mechanism for persistent storage — data that survives beyond the lifetime of the process that created it.
File Attributes
Every file has metadata (data about the file) stored in the file system. The exact attributes vary by OS, but the core set includes:
| Attribute | Description | Example |
|---|---|---|
| Name | Human-readable identifier | report.pdf, index.html |
| Identifier | Unique numeric tag within the file system | inode number: 123456 |
| Type | The kind of content | Regular file, directory, symbolic link |
| Location | Pointer to where the file lives on the device | Block number on disk |
| Size | Current size (bytes) | 45,678 bytes |
| Protection | Access control permissions | rwxr-xr-x (755) |
| Timestamps | Creation, last access, last modification | 2026-05-13 23:45:00 |
| User/Group ID | Owner and group identifiers | uid=1000, gid=1000 |
File Types
| Type | Description | Examples |
|---|---|---|
| Regular File | Contains user data (text, binary, executable) | .txt, .pdf, .exe, .jpg |
| Directory File | Contains a list of file names and pointers to their inodes | /home, /etc |
| Special File | Represents a hardware device (block or character) | /dev/sda (disk), /dev/tty (terminal) |
| Symbolic Link | A pointer (shortcut) to another file | Creates an alternative pathname |
File Operations
The OS provides a standard set of operations that can be performed on files:
| Operation | Description |
|---|---|
| Create | Allocate space for the file and record its entry in the directory |
| Open | Bring the file's metadata (inode) into memory for faster access |
| Read | Read bytes from the current file position |
| Write | Write bytes at the current file position |
| Seek | Reposition the file pointer within the file |
| Delete | Release all allocated space and remove the directory entry |
| Truncate | Delete the file's contents but keep its attributes |
| Close | Flush any buffered data and release the in-memory file handle |
File Access Methods
How does a process read data from a file? The answer depends on the access method supported by the file system.
1. Sequential Access
The most common and simplest model. The file is read one block at a time, in order, from beginning to end. The OS automatically advances a current-file-position pointer after each read/write.
read → next block → read → next block → ...
- Pros: Simple, efficient for streaming data (logs, tape backups).
- Cons: Cannot skip to arbitrary positions.
- Used by: Tape drives, early file systems, pipes.
2. Direct (Random) Access
A file is composed of fixed-size logical blocks (typically 512 bytes or 4 KB). The process can access any block by specifying its block number directly.
read block 5 → read block 2 → read block 9 → ...
- Pros: Fast random access — databases rely on this.
- Cons: More complex file system support needed.
- Used by: Almost all modern file systems (ext4, NTFS, FAT32). The
lseek()system call enables direct access by moving the file pointer to any byte position.
3. Indexed Access
An extension of direct access where an index block contains pointers to the file's data blocks. To read the i-th block, the process looks up the i-th entry in the index, then reads that block.
- Pros: No external fragmentation, supports very large files via multi-level indexing.
- Cons: Index blocks consume storage space.
- Used by: UNIX inode-based file systems (ext2, ext3, ext4).
Directory Structures
A directory is a special file that maps file names to their corresponding inodes (or file control blocks). The structure of this mapping has evolved over time.
1. Single-Level Directory
The simplest structure — all files in one global directory.
| Pros | Cons |
|---|---|
| Simple, easy to implement | Naming collision — two users cannot have files with the same name |
| One lookup step | Becomes unmanageable with hundreds of files |
2. Two-Level Directory
Each user gets their own directory. System files may be in a separate master directory.
| Pros | Cons |
|---|---|
| No naming collisions between users | Users cannot share files easily |
| Simple isolation and security | No sub-directory nesting |
3. Tree-Structured Directory
The most common modern approach. Users can create subdirectories to any depth, forming a tree.
| Pros | Cons |
|---|---|
| Natural hierarchical organization | Pathname traversal can be slow for deep trees |
| Each user can structure files as needed | Cannot share files between branches (without links) |
| Used by: UNIX, Linux, Windows (C:\Users...) |
4. Acyclic-Graph Directory
An extension of the tree that allows shared subdirectories. A file or subdirectory can appear in multiple parent directories via links.
- Hard Link: A direct pointer to the same inode data structure. Deleting one link does not delete the file — the inode's reference count decrements; only when the count reaches 0 is the file actually deleted. Cannot link across file systems or to directories.
- Soft Link (Symbolic Link): A special file that contains the pathname of another file. If the target file is deleted, the symbolic link becomes a dangling link. Can link to directories and across file systems.
| Pros | Cons |
|---|---|
| Enables sharing and collaboration | Dangling links — soft link points to deleted file |
| Flexible organization | Cycles possible if links are not carefully managed (affects traversal/backup) |
How an inode Points to Data Blocks — Maximum File Size Calculation
- 1Step 1
An inode (index node) is the data structure in UNIX-style file systems that stores all metadata about a file except its name. Critically, the inode contains block pointers — addresses of the disk blocks where the file's actual data is stored.
- 2Step 2
A typical inode contains a fixed number of direct block pointers (e.g., 12 pointers), plus one pointer each for: single indirect, double indirect, and triple indirect blocks. Each pointer is 4 bytes (32-bit addressing) or 8 bytes (64-bit). Block size is typically 4 KB (4096 bytes).
- 3Step 3
Direct blocks: The first 12 block pointers point directly to 12 data blocks = 12 × 4 KB = 48 KB of data. Single indirect: The pointer points to a block containing more block pointers. A 4 KB block with 4-byte pointers = 4096/4 = 1024 pointers. This adds 1024 × 4 KB = 4 MB. Double indirect: A block of pointers to single-indirect blocks. 1024 × 1024 × 4 KB = 4 GB. Triple indirect: A block of pointers to double-indirect blocks. 1024 × 1024 × 1024 × 4 KB = 4 TB.
- 4Step 4
Your inode looks like this:
- 5Step 5
Maximum file size = Direct + Single Indirect + Double Indirect + Triple Indirect = (12 × 4 KB) + (1024 × 4 KB) + (1024² × 4 KB) + (1024³ × 4 KB) = 48 KB + 4 MB + 4 GB + 4 TB ≈ 4 TB + 4 GB + 4 MB + 48 KB
For ext4 with 4 KB blocks and 48-bit block addressing, the actual limit is larger, but this conceptual model is the core exam topic.
- 6Step 6
The formula to remember: Each indirect block contains (block size / pointer size) pointers.
- Single indirect block =
(B / P)data blocks - Double indirect =
(B / P)²data blocks - Triple indirect =
(B / P)³data blocks
Where B = block size (bytes), P = pointer size (bytes). So
B/P= number of pointers per indirect block. - Single indirect block =
File Allocation Methods
When a file is created, the OS must allocate disk blocks to store its data. Three primary methods exist, each with distinct trade-offs.
1. Contiguous Allocation
Each file occupies a contiguous set of blocks on the disk — blocks 4, 5, 6, 7, 8 for a file needing 5 blocks.
| Aspect | Detail |
|---|---|
| How it works | The file's directory entry stores the starting block number and the total length (in blocks) |
| Sequential access | Very fast — the disk head reads blocks in order with almost no seeking |
| Direct access | Fast — to access block i, read start + i |
| Problem | External fragmentation — as files are created and deleted, free space becomes broken into small gaps. A new file may not find a contiguous gap large enough even if total free space is sufficient |
| Problem 2 | File size must be declared in advance — if a file grows beyond its allocated space, it cannot expand into neighboring occupied blocks |
Compaction: The OS can periodically shuffle blocks to consolidate free space into one large contiguous region — but this is expensive and takes the disk offline.
2. Linked Allocation
Each block contains a pointer to the next block in the file. The directory entry stores the starting block number and the ending block number.
| Aspect | Detail |
|---|---|
| How it works | Each block points to the next. The last block has a null pointer (−1) |
| Sequential access | Reasonable — follow the chain of pointers |
| Direct access | Very slow — to access block i, you must read all i preceding blocks sequentially |
| Space overhead | Each block loses a few bytes to the pointer (e.g., 4 bytes per 512-byte block = 0.8% overhead) |
| Reliability | A single broken pointer destroys the entire file chain |
| Used by | FAT (File Allocation Table) file systems solve the pointer problem by storing the chain in a separate table (FAT) in a reserved area of the disk |
3. Indexed Allocation
All pointers are collected into one location: the index block (called an inode in UNIX). The directory entry points to the index block, which contains an array of pointers to all data blocks.
| Aspect | Detail |
|---|---|
| How it works | The inode contains direct pointers + single/double/triple indirect pointers (see Step-by-Step above) |
| Sequential access | Moderate — need to read the inode/index block first to find data block addresses |
| Direct access | Very fast — the i-th pointer in the inode gives the exact block address. No need to traverse a chain |
| Space overhead | Index blocks consume space. For small files, the index block may be larger than the data itself |
| Solution | UNIX inode: Inline direct pointers for small files, indirect pointers only for large files. This hybrid gives the best of both worlds |
Allocation Methods Comparison
| Feature | Contiguous | Linked (FAT) | Indexed (inode) |
|---|---|---|---|
| External fragmentation | Yes | No | No |
| Internal fragmentation | Minimal | Per-block pointer overhead | Index block overhead |
| Sequential access | Excellent | Good | Good |
| Direct access | Excellent | Very poor | Excellent |
| File growth | Difficult (must pre-declare or copy) | Easy (add another block) | Easy (add another block + indirect pointer) |
| Reliability | Good | Chain breaks = file lost | Index block corruption = file lost |
| Used by | Old systems, some ISOs | FAT32, exFAT (USB drives) | ext4, NTFS, XFS (modern OS) |
Free Space Management
The OS must track which disk blocks are free and which are allocated. Several methods are used.
| Method | Description | Space Overhead | Performance |
|---|---|---|---|
| Bit Vector | A bitmap with 1 bit per block: 1 = free, 0 = allocated | 1 bit per block. For a 1 TB disk with 4 KB blocks: 32 MB for the bitmap | Very fast — finding a free block is a bit-scan operation |
| Linked List | A linked list of free block numbers, stored in free blocks themselves | Minimal — uses free blocks to store the list | Slow — must traverse the list to find free blocks |
| Grouping | Store addresses of n free blocks in one free block. The first n-1 are free; the last points to the next group | Low | Fast — can allocate many blocks at once |
| Counting | Store (block address, count) pairs for contiguous runs of free blocks | Best when free space has few large contiguous regions | Efficient for SSD wear-leveling |
Knowledge Check
Which file allocation method suffers from external fragmentation?