Coursify

Operating System

File Management and Allocation Methods

60 mins

Learning Goals

  • Classify file types (regular, directory, special) and distinguish between sequential, direct, and indexed access methods.
  • Design and compare single-level, two-level, tree-structured, and acyclic-graph directory structures.
  • Analyze the performance trade-offs of contiguous, linked, and indexed file allocation methods.
  • Calculate disk space utilization and fragmentation for each allocation method.

File Concepts: What is a File?

A file is a named collection of related data that is treated as a unit by the operating system. Files are the primary mechanism for persistent storage — data that survives beyond the lifetime of the process that created it.

File Attributes

Every file has metadata (data about the file) stored in the file system. The exact attributes vary by OS, but the core set includes:

AttributeDescriptionExample
NameHuman-readable identifierreport.pdf, index.html
IdentifierUnique numeric tag within the file systeminode number: 123456
TypeThe kind of contentRegular file, directory, symbolic link
LocationPointer to where the file lives on the deviceBlock number on disk
SizeCurrent size (bytes)45,678 bytes
ProtectionAccess control permissionsrwxr-xr-x (755)
TimestampsCreation, last access, last modification2026-05-13 23:45:00
User/Group IDOwner and group identifiersuid=1000, gid=1000

File Types

TypeDescriptionExamples
Regular FileContains user data (text, binary, executable).txt, .pdf, .exe, .jpg
Directory FileContains a list of file names and pointers to their inodes/home, /etc
Special FileRepresents a hardware device (block or character)/dev/sda (disk), /dev/tty (terminal)
Symbolic LinkA pointer (shortcut) to another fileCreates an alternative pathname

File Operations

The OS provides a standard set of operations that can be performed on files:

OperationDescription
CreateAllocate space for the file and record its entry in the directory
OpenBring the file's metadata (inode) into memory for faster access
ReadRead bytes from the current file position
WriteWrite bytes at the current file position
SeekReposition the file pointer within the file
DeleteRelease all allocated space and remove the directory entry
TruncateDelete the file's contents but keep its attributes
CloseFlush any buffered data and release the in-memory file handle

File Access Methods

How does a process read data from a file? The answer depends on the access method supported by the file system.

1. Sequential Access

The most common and simplest model. The file is read one block at a time, in order, from beginning to end. The OS automatically advances a current-file-position pointer after each read/write.

read → next block → read → next block → ...
  • Pros: Simple, efficient for streaming data (logs, tape backups).
  • Cons: Cannot skip to arbitrary positions.
  • Used by: Tape drives, early file systems, pipes.

2. Direct (Random) Access

A file is composed of fixed-size logical blocks (typically 512 bytes or 4 KB). The process can access any block by specifying its block number directly.

read block 5 → read block 2 → read block 9 → ...
  • Pros: Fast random access — databases rely on this.
  • Cons: More complex file system support needed.
  • Used by: Almost all modern file systems (ext4, NTFS, FAT32). The lseek() system call enables direct access by moving the file pointer to any byte position.

3. Indexed Access

An extension of direct access where an index block contains pointers to the file's data blocks. To read the i-th block, the process looks up the i-th entry in the index, then reads that block.

  • Pros: No external fragmentation, supports very large files via multi-level indexing.
  • Cons: Index blocks consume storage space.
  • Used by: UNIX inode-based file systems (ext2, ext3, ext4).

Directory Structures

A directory is a special file that maps file names to their corresponding inodes (or file control blocks). The structure of this mapping has evolved over time.

1. Single-Level Directory

The simplest structure — all files in one global directory.

ProsCons
Simple, easy to implementNaming collision — two users cannot have files with the same name
One lookup stepBecomes unmanageable with hundreds of files

2. Two-Level Directory

Each user gets their own directory. System files may be in a separate master directory.

ProsCons
No naming collisions between usersUsers cannot share files easily
Simple isolation and securityNo sub-directory nesting

3. Tree-Structured Directory

The most common modern approach. Users can create subdirectories to any depth, forming a tree.

ProsCons
Natural hierarchical organizationPathname traversal can be slow for deep trees
Each user can structure files as neededCannot share files between branches (without links)
Used by: UNIX, Linux, Windows (C:\Users...)

4. Acyclic-Graph Directory

An extension of the tree that allows shared subdirectories. A file or subdirectory can appear in multiple parent directories via links.

  • Hard Link: A direct pointer to the same inode data structure. Deleting one link does not delete the file — the inode's reference count decrements; only when the count reaches 0 is the file actually deleted. Cannot link across file systems or to directories.
  • Soft Link (Symbolic Link): A special file that contains the pathname of another file. If the target file is deleted, the symbolic link becomes a dangling link. Can link to directories and across file systems.
ProsCons
Enables sharing and collaborationDangling links — soft link points to deleted file
Flexible organizationCycles possible if links are not carefully managed (affects traversal/backup)

How an inode Points to Data Blocks — Maximum File Size Calculation

  1. 1
    Step 1

    An inode (index node) is the data structure in UNIX-style file systems that stores all metadata about a file except its name. Critically, the inode contains block pointers — addresses of the disk blocks where the file's actual data is stored.

  2. 2
    Step 2

    A typical inode contains a fixed number of direct block pointers (e.g., 12 pointers), plus one pointer each for: single indirect, double indirect, and triple indirect blocks. Each pointer is 4 bytes (32-bit addressing) or 8 bytes (64-bit). Block size is typically 4 KB (4096 bytes).

  3. 3
    Step 3

    Direct blocks: The first 12 block pointers point directly to 12 data blocks = 12 × 4 KB = 48 KB of data. Single indirect: The pointer points to a block containing more block pointers. A 4 KB block with 4-byte pointers = 4096/4 = 1024 pointers. This adds 1024 × 4 KB = 4 MB. Double indirect: A block of pointers to single-indirect blocks. 1024 × 1024 × 4 KB = 4 GB. Triple indirect: A block of pointers to double-indirect blocks. 1024 × 1024 × 1024 × 4 KB = 4 TB.

  4. 4
    Step 4

    Your inode looks like this:

  5. 5
    Step 5

    Maximum file size = Direct + Single Indirect + Double Indirect + Triple Indirect = (12 × 4 KB) + (1024 × 4 KB) + (1024² × 4 KB) + (1024³ × 4 KB) = 48 KB + 4 MB + 4 GB + 4 TB ≈ 4 TB + 4 GB + 4 MB + 48 KB

    For ext4 with 4 KB blocks and 48-bit block addressing, the actual limit is larger, but this conceptual model is the core exam topic.

  6. 6
    Step 6

    The formula to remember: Each indirect block contains (block size / pointer size) pointers.

    • Single indirect block = (B / P) data blocks
    • Double indirect = (B / P)² data blocks
    • Triple indirect = (B / P)³ data blocks

    Where B = block size (bytes), P = pointer size (bytes). So B/P = number of pointers per indirect block.

File Allocation Methods

When a file is created, the OS must allocate disk blocks to store its data. Three primary methods exist, each with distinct trade-offs.

1. Contiguous Allocation

Each file occupies a contiguous set of blocks on the disk — blocks 4, 5, 6, 7, 8 for a file needing 5 blocks.

AspectDetail
How it worksThe file's directory entry stores the starting block number and the total length (in blocks)
Sequential accessVery fast — the disk head reads blocks in order with almost no seeking
Direct accessFast — to access block i, read start + i
ProblemExternal fragmentation — as files are created and deleted, free space becomes broken into small gaps. A new file may not find a contiguous gap large enough even if total free space is sufficient
Problem 2File size must be declared in advance — if a file grows beyond its allocated space, it cannot expand into neighboring occupied blocks

Compaction: The OS can periodically shuffle blocks to consolidate free space into one large contiguous region — but this is expensive and takes the disk offline.

2. Linked Allocation

Each block contains a pointer to the next block in the file. The directory entry stores the starting block number and the ending block number.

AspectDetail
How it worksEach block points to the next. The last block has a null pointer (−1)
Sequential accessReasonable — follow the chain of pointers
Direct accessVery slow — to access block i, you must read all i preceding blocks sequentially
Space overheadEach block loses a few bytes to the pointer (e.g., 4 bytes per 512-byte block = 0.8% overhead)
ReliabilityA single broken pointer destroys the entire file chain
Used byFAT (File Allocation Table) file systems solve the pointer problem by storing the chain in a separate table (FAT) in a reserved area of the disk

3. Indexed Allocation

All pointers are collected into one location: the index block (called an inode in UNIX). The directory entry points to the index block, which contains an array of pointers to all data blocks.

AspectDetail
How it worksThe inode contains direct pointers + single/double/triple indirect pointers (see Step-by-Step above)
Sequential accessModerate — need to read the inode/index block first to find data block addresses
Direct accessVery fast — the i-th pointer in the inode gives the exact block address. No need to traverse a chain
Space overheadIndex blocks consume space. For small files, the index block may be larger than the data itself
SolutionUNIX inode: Inline direct pointers for small files, indirect pointers only for large files. This hybrid gives the best of both worlds

Allocation Methods Comparison

FeatureContiguousLinked (FAT)Indexed (inode)
External fragmentationYesNoNo
Internal fragmentationMinimalPer-block pointer overheadIndex block overhead
Sequential accessExcellentGoodGood
Direct accessExcellentVery poorExcellent
File growthDifficult (must pre-declare or copy)Easy (add another block)Easy (add another block + indirect pointer)
ReliabilityGoodChain breaks = file lostIndex block corruption = file lost
Used byOld systems, some ISOsFAT32, exFAT (USB drives)ext4, NTFS, XFS (modern OS)

Free Space Management

The OS must track which disk blocks are free and which are allocated. Several methods are used.

MethodDescriptionSpace OverheadPerformance
Bit VectorA bitmap with 1 bit per block: 1 = free, 0 = allocated1 bit per block. For a 1 TB disk with 4 KB blocks: 32 MB for the bitmapVery fast — finding a free block is a bit-scan operation
Linked ListA linked list of free block numbers, stored in free blocks themselvesMinimal — uses free blocks to store the listSlow — must traverse the list to find free blocks
GroupingStore addresses of n free blocks in one free block. The first n-1 are free; the last points to the next groupLowFast — can allocate many blocks at once
CountingStore (block address, count) pairs for contiguous runs of free blocksBest when free space has few large contiguous regionsEfficient for SSD wear-leveling

Knowledge Check

Question 1 of 4
Q1Single choice

Which file allocation method suffers from external fragmentation?

File Management and Allocation Methods | Operating System | Coursify