Coursify

Compiler Design (CD)

Symbol Table Management

35 mins

The central data structure of a compiler. Learn how the symbol table stores identifier attributes, its role in Semantic Analysis, and how it is constructed and maintained.

Learning Goals

  • Define the symbol table as a mapping from identifier names to their attributes and explain why every compiler phase relies on it.
  • List the exact information stored in the symbol table for variables, functions, arrays, and classes/structs with concrete examples.
  • Explain hash table implementation with hash functions, bucket chaining, collision resolution, and why O(1) lookup is preferred.
  • Describe scope management using a stack of symbol tables for nested scopes (global → function → block).
  • Trace the symbol table stack operations (push/pop) through a concrete code fragment with nested blocks.
  • Describe how identifiers are stored, updated, and searched during each compiler phase.
  • Write a comprehensive short note on the Symbol Table as frequently required in exams.

What is the Symbol Table?

The Symbol Table is the central nervous system of the compiler. Formally, it is a mapping from identifier names to their attributes — a dictionary that lets the compiler answer "what is this name and what do we know about it?" at any point during compilation.

Every phase of the compiler depends on it:

  • Lexer — inserts raw identifier names.
  • Parser — structures scope relationships.
  • Semantic Analyzer — updates type and declaration information, performs type checking and scope validation.
  • Code Generator — looks up offsets and sizes to emit correct machine instructions.

Without the symbol table, the compiler cannot verify that a variable is declared before use, that types match in expressions, or that a function is called with the right number of arguments.

Concrete Symbol Table Example

Attributes Stored

The symbol table stores different attributes depending on the kind of identifier. Below is a detailed breakdown:

Identifier KindAttributeExamplePurpose
VariableNamecountIdentifies the symbol
VariableData TypeintType checking expressions
VariableScope Level2Scope validation
VariableMemory Offset0Code generation address
VariableSize (bytes)4Allocation in activation record
VariableLine Number5Error reporting
FunctionNamecalculateSumIdentifies the symbol
FunctionReturn TypeintType checking return values
FunctionNumber of Params2Validate call argument count
FunctionParam Types[int, float]Type checking arguments
FunctionParam Passing Modeby valueCode generation for calls
ArrayNamearrIdentifies the symbol
ArrayElement TypeintType checking array accesses
ArrayDimensions[10]Bounds checking
ArrayLower/Upper Bounds0 / 9Index validation
ArrayTotal Size40Memory allocation
Class/StructNamePointIdentifies the symbol
Class/StructField List{x: int, y: int}Member access checking
Class/StructField Offsets{x: 0, y: 4}Code generation for member access
Class/StructParent ClassShapeInheritance resolution
Class/StructVisibilitypublicAccess control enforcement

How the Type Attribute Drives Type Checking: When the compiler encounters an expression like x = y + z, it looks up the types of y and z in the symbol table. If y is float and z is int, the semantic analyzer determines the result type is float (promotion rules) and checks whether assigning a float to x is legal by looking up x's declared type. Every type decision traces back to the symbol table.

Symbol Table Implementation: Data Structures

Structure: A simple array or linked list of entries. New identifiers are appended at the end.

OperationComplexity
InsertO(1)O(1)
LookupO(n)O(n) — must scan every entry

When to use: Only for tiny programs with fewer than ~20 identifiers. For any real program, O(n)O(n) lookup is too slow — a compiler processing 1000 identifiers would need 1000 comparisons per lookup on average.

Hash Table with Collision Resolution

Consider a hash table with M=5M = 5 buckets and the hash function h(name)=(sum of ASCII values)mod5h(name) = (\text{sum of ASCII values}) \mod 5:

IdentifierASCII Sumh(name)h(name)Bucket
x120120mod5=0120 \mod 5 = 00
count529529mod5=4529 \mod 5 = 44
y121121mod5=1121 \mod 5 = 11
a9797mod5=297 \mod 5 = 22
b9898mod5=398 \mod 5 = 33
z122122mod5=2122 \mod 5 = 22 (collision with a)

Bucket 2 chains a → z as a linked list. Lookup for z: compute h(z)=2h(z) = 2, scan the chain at bucket 2, find z. Still O(1)O(1) on average because chains are short when the table is appropriately sized.

Why Hash Tables?

A compiler may process thousands of identifiers in a single source file. Each identifier is looked up multiple times — during declaration processing, every expression it appears in, and during code generation. With a linear list, each lookup is O(n)O(n), so total lookup cost grows quadratically. A hash table gives O(1)O(1) average lookup, making total cost linear. This difference is the reason hash tables are the standard implementation for symbol tables in production compilers like GCC and LLVM.

Scope Management with Symbol Table Stack

Most programming languages support nested scopes: a variable declared inside a function or block is visible only within that function or block. The compiler manages this by maintaining a stack of symbol tables, where each table corresponds to one active scope.

Scope hierarchy:

  1. Global scope — contains globally visible identifiers (global variables, function declarations).
  2. Function scope — contains parameters and local variables of the currently compiled function.
  3. Block scope — contains variables declared inside { } blocks within a function (e.g., inside if, for, or standalone blocks).

When a new scope is entered, a new symbol table is pushed onto the stack. When the scope is exited, its table is popped (and its entries become invisible). To resolve a name, the compiler searches from the top of the stack (current scope) downward until it finds the name — this implements the "innermost scope wins" rule.

Tracing Scope Stack Operations on a Code Fragment

  1. 1
    Step 1
  2. 2
    Step 2
  3. 3
    Step 3
  4. 4
    Step 4
  5. 5
    Step 5
  6. 6
    Step 6

Symbol Table Stack State at Each Point

Lifecycle Across Compiler Phases

The symbol table is not created in one single step. It is a living data structure maintained across multiple compiler phases:

PhaseOperation on Symbol TableDetails
Lexical AnalysisInsertWhen the lexer encounters a new identifier, it inserts the raw name into the symbol table. Type and scope are unknown at this point.
Syntax AnalysisStructureThe parser establishes the scope of each identifier based on the block structure ({ } braces, function definitions). It may create scope markers.
Semantic AnalysisUpdateThe compiler reads declarations and fills in type, size, offset, and parameter information. It also performs type checking and scope validation using lookups.
Code GenerationLookupThe code generator repeatedly looks up offsets, sizes, and types to emit correct instructions (e.g., LOAD SP + offset).
OptimizationLookup + UpdateOptimizers may read alias info, update constant values, or mark dead variables.

The Symbol Table Lifecycle:

Common Questions on Symbol Table Management

Knowledge Check

Question 1 of 7
Q1Single choice

Which data structure is most commonly used to implement a symbol table in a production compiler? [PYQ 2023]