File Management and Allocation Methods

60 mins

Learning Goals

Classify file types (regular, directory, special) and distinguish between sequential, direct, and indexed access methods.
Design and compare single-level, two-level, tree-structured, and acyclic-graph directory structures.
Analyze the performance trade-offs of contiguous, linked, and indexed file allocation methods.
Calculate disk space utilization and fragmentation for each allocation method.

File Concepts: What is a File?

A file is a named collection of related data that is treated as a unit by the operating system. Files are the primary mechanism for persistent storage — data that survives beyond the lifetime of the process that created it.

File Attributes

Every file has metadata (data about the file) stored in the file system. The exact attributes vary by OS, but the core set includes:

Attribute	Description	Example
Name	Human-readable identifier	`report.pdf`, `index.html`
Identifier	Unique numeric tag within the file system	inode number: 123456
Type	The kind of content	Regular file, directory, symbolic link
Location	Pointer to where the file lives on the device	Block number on disk
Size	Current size (bytes)	45,678 bytes
Protection	Access control permissions	`rwxr-xr-x` (755)
Timestamps	Creation, last access, last modification	`2026-05-13 23:45:00`
User/Group ID	Owner and group identifiers	uid=1000, gid=1000

File Types

Type	Description	Examples
Regular File	Contains user data (text, binary, executable)	`.txt`, `.pdf`, `.exe`, `.jpg`
Directory File	Contains a list of file names and pointers to their inodes	`/home`, `/etc`
Special File	Represents a hardware device (block or character)	`/dev/sda` (disk), `/dev/tty` (terminal)
Symbolic Link	A pointer (shortcut) to another file	Creates an alternative pathname

File Operations

The OS provides a standard set of operations that can be performed on files:

Operation	Description
Create	Allocate space for the file and record its entry in the directory
Open	Bring the file's metadata (inode) into memory for faster access
Read	Read bytes from the current file position
Write	Write bytes at the current file position
Seek	Reposition the file pointer within the file
Delete	Release all allocated space and remove the directory entry
Truncate	Delete the file's contents but keep its attributes
Close	Flush any buffered data and release the in-memory file handle

File Access Methods

How does a process read data from a file? The answer depends on the access method supported by the file system.

1. Sequential Access

The most common and simplest model. The file is read one block at a time, in order, from beginning to end. The OS automatically advances a current-file-position pointer after each read/write.

read → next block → read → next block → ...

Pros: Simple, efficient for streaming data (logs, tape backups).
Cons: Cannot skip to arbitrary positions.
Used by: Tape drives, early file systems, pipes.

2. Direct (Random) Access

A file is composed of fixed-size logical blocks (typically 512 bytes or 4 KB). The process can access any block by specifying its block number directly.

read block 5 → read block 2 → read block 9 → ...

Pros: Fast random access — databases rely on this.
Cons: More complex file system support needed.
Used by: Almost all modern file systems (ext4, NTFS, FAT32). The lseek() system call enables direct access by moving the file pointer to any byte position.

3. Indexed Access

An extension of direct access where an index block contains pointers to the file's data blocks. To read the i-th block, the process looks up the i-th entry in the index, then reads that block.

Pros: No external fragmentation, supports very large files via multi-level indexing.
Cons: Index blocks consume storage space.
Used by: UNIX inode-based file systems (ext2, ext3, ext4).

Directory Structures

A directory is a special file that maps file names to their corresponding inodes (or file control blocks). The structure of this mapping has evolved over time.

1. Single-Level Directory

The simplest structure — all files in one global directory.

Pros	Cons
Simple, easy to implement	Naming collision — two users cannot have files with the same name
One lookup step	Becomes unmanageable with hundreds of files

2. Two-Level Directory

Each user gets their own directory. System files may be in a separate master directory.

Pros	Cons
No naming collisions between users	Users cannot share files easily
Simple isolation and security	No sub-directory nesting

3. Tree-Structured Directory

The most common modern approach. Users can create subdirectories to any depth, forming a tree.

Pros	Cons
Natural hierarchical organization	Pathname traversal can be slow for deep trees
Each user can structure files as needed	Cannot share files between branches (without links)
Used by: UNIX, Linux, Windows (C:\Users...)

4. Acyclic-Graph Directory

An extension of the tree that allows shared subdirectories. A file or subdirectory can appear in multiple parent directories via links.

Hard Link: A direct pointer to the same inode data structure. Deleting one link does not delete the file — the inode's reference count decrements; only when the count reaches 0 is the file actually deleted. Cannot link across file systems or to directories.
Soft Link (Symbolic Link): A special file that contains the pathname of another file. If the target file is deleted, the symbolic link becomes a dangling link. Can link to directories and across file systems.

Pros	Cons
Enables sharing and collaboration	Dangling links — soft link points to deleted file
Flexible organization	Cycles possible if links are not carefully managed (affects traversal/backup)

How an inode Points to Data Blocks — Maximum File Size Calculation

1
Step 1
An inode (index node) is the data structure in UNIX-style file systems that stores all metadata about a file except its name. Critically, the inode contains block pointers — addresses of the disk blocks where the file's actual data is stored.
2
Step 2
A typical inode contains a fixed number of direct block pointers (e.g., 12 pointers), plus one pointer each for: single indirect, double indirect, and triple indirect blocks. Each pointer is 4 bytes (32-bit addressing) or 8 bytes (64-bit). Block size is typically 4 KB (4096 bytes).
3
Step 3
Direct blocks: The first 12 block pointers point directly to 12 data blocks = 12 × 4 KB = 48 KB of data. Single indirect: The pointer points to a block containing more block pointers. A 4 KB block with 4-byte pointers = 4096/4 = 1024 pointers. This adds 1024 × 4 KB = 4 MB. Double indirect: A block of pointers to single-indirect blocks. 1024 × 1024 × 4 KB = 4 GB. Triple indirect: A block of pointers to double-indirect blocks. 1024 × 1024 × 1024 × 4 KB = 4 TB.
4
Step 4
Your inode looks like this:
5
Step 5
Maximum file size = Direct + Single Indirect + Double Indirect + Triple Indirect = (12 × 4 KB) + (1024 × 4 KB) + (1024² × 4 KB) + (1024³ × 4 KB) = 48 KB + 4 MB + 4 GB + 4 TB ≈ 4 TB + 4 GB + 4 MB + 48 KB

For ext4 with 4 KB blocks and 48-bit block addressing, the actual limit is larger, but this conceptual model is the core exam topic.
6
Step 6
The formula to remember: Each indirect block contains (block size / pointer size) pointers.

Single indirect block = (B / P) data blocks

Double indirect = (B / P)² data blocks

Triple indirect = (B / P)³ data blocks

Where B = block size (bytes), P = pointer size (bytes). So B/P = number of pointers per indirect block.

File Allocation Methods

When a file is created, the OS must allocate disk blocks to store its data. Three primary methods exist, each with distinct trade-offs.

1. Contiguous Allocation

Each file occupies a contiguous set of blocks on the disk — blocks 4, 5, 6, 7, 8 for a file needing 5 blocks.

Aspect	Detail
How it works	The file's directory entry stores the starting block number and the total length (in blocks)
Sequential access	Very fast — the disk head reads blocks in order with almost no seeking
Direct access	Fast — to access block `i`, read `start + i`
Problem	External fragmentation — as files are created and deleted, free space becomes broken into small gaps. A new file may not find a contiguous gap large enough even if total free space is sufficient
Problem 2	File size must be declared in advance — if a file grows beyond its allocated space, it cannot expand into neighboring occupied blocks

Compaction: The OS can periodically shuffle blocks to consolidate free space into one large contiguous region — but this is expensive and takes the disk offline.

2. Linked Allocation

Each block contains a pointer to the next block in the file. The directory entry stores the starting block number and the ending block number.

Aspect	Detail
How it works	Each block points to the next. The last block has a null pointer (−1)
Sequential access	Reasonable — follow the chain of pointers
Direct access	Very slow — to access block `i`, you must read all `i` preceding blocks sequentially
Space overhead	Each block loses a few bytes to the pointer (e.g., 4 bytes per 512-byte block = 0.8% overhead)
Reliability	A single broken pointer destroys the entire file chain
Used by	FAT (File Allocation Table) file systems solve the pointer problem by storing the chain in a separate table (FAT) in a reserved area of the disk

3. Indexed Allocation

All pointers are collected into one location: the index block (called an inode in UNIX). The directory entry points to the index block, which contains an array of pointers to all data blocks.

Aspect	Detail
How it works	The inode contains direct pointers + single/double/triple indirect pointers (see Step-by-Step above)
Sequential access	Moderate — need to read the inode/index block first to find data block addresses
Direct access	Very fast — the i-th pointer in the inode gives the exact block address. No need to traverse a chain
Space overhead	Index blocks consume space. For small files, the index block may be larger than the data itself
Solution	UNIX inode: Inline direct pointers for small files, indirect pointers only for large files. This hybrid gives the best of both worlds

Allocation Methods Comparison

Feature	Contiguous	Linked (FAT)	Indexed (inode)
External fragmentation	Yes	No	No
Internal fragmentation	Minimal	Per-block pointer overhead	Index block overhead
Sequential access	Excellent	Good	Good
Direct access	Excellent	Very poor	Excellent
File growth	Difficult (must pre-declare or copy)	Easy (add another block)	Easy (add another block + indirect pointer)
Reliability	Good	Chain breaks = file lost	Index block corruption = file lost
Used by	Old systems, some ISOs	FAT32, exFAT (USB drives)	ext4, NTFS, XFS (modern OS)

Free Space Management

The OS must track which disk blocks are free and which are allocated. Several methods are used.

Method	Description	Space Overhead	Performance
Bit Vector	A bitmap with 1 bit per block: 1 = free, 0 = allocated	1 bit per block. For a 1 TB disk with 4 KB blocks: 32 MB for the bitmap	Very fast — finding a free block is a bit-scan operation
Linked List	A linked list of free block numbers, stored in free blocks themselves	Minimal — uses free blocks to store the list	Slow — must traverse the list to find free blocks
Grouping	Store addresses of n free blocks in one free block. The first n-1 are free; the last points to the next group	Low	Fast — can allocate many blocks at once
Counting	Store (block address, count) pairs for contiguous runs of free blocks	Best when free space has few large contiguous regions	Efficient for SSD wear-leveling

Knowledge Check

Question 1 of 4

Q1Single choice

Which file allocation method suffers from external fragmentation?

Contiguous allocation

Linked allocation (FAT)

Indexed allocation (inode)

All three methods

Inode Structure — GeeksforGeeks

article

I/O Systems and Controllers

Disk Management and Scheduling