Coursify

Compiler Design (CD)

Intermediate Code Forms — TAC, Quadruples and Triples

45 mins

Learn how a compiler represents a program in a machine-independent form as a bridge between parsing and code generation. Master Three-Address Code and its three concrete representations: Quadruples, Triples, and Indirect Triples.

Learning Goals

  • Define Three-Address Code (TAC) and explain its advantages as an intermediate representation.
  • Generate TAC for arithmetic expressions, relational expressions, and assignment statements.
  • Represent a TAC sequence using Quadruples, Triples, and Indirect Triples data structures.
  • Generate TAC for array accesses (both reading and writing elements).
  • Apply backpatching to produce correct conditional and unconditional jump instructions.

Why Intermediate Representation (IR)?

A compiler typically translates source code into an Intermediate Representation (IR) before generating the final target machine code. This "middle-man" approach provides two major advantages:

  1. Portability: The front-end (Lexical/Syntax/Semantic analysis) can be shared across multiple CPUs. You only need to write a new back-end for each target architecture.
  2. Optimization: It is much easier to perform complex code improvements (like removing redundant calculations) on a clean, machine-independent IR than on messy, machine-specific assembly.

The most common IR is Three-Address Code (TAC).


Three-Address Code (TAC)

In Three-Address Code, every instruction has at most one operator and at most three addresses. Two addresses hold the operands, and one holds the result.

General Form: x = y op z

Key Address Types:

  • Names: Identifiers from the source code (e.g., a, b, position).
  • Constants: Literal values (e.g., 3.14, 100).
  • Compiler-Generated Temporaries: Temporary variables created by the compiler (e.g., t1, t2) to hold intermediate results of a large expression.

Example: The expression a + b * c + d is broken down into:

1t1 = b * c 2t2 = a + t1 3t3 = t2 + d

A Quadruple is a record with four fields: op, arg1, arg2, and result.

Example: a = b * -c + b * -c

#oparg1arg2result
(0)uminusct1
(1)*bt1t2
(2)uminusct3
(3)*bt3t4
(4)+t2t4t5
(5)=t5a

Pros: Explicitly names every result. Easy to move and reorder during optimization. Cons: Requires extra memory to store the names of temporary variables.

Numerical: Generating TAC for Array Accesses

  1. 1
    Step 1

    Translating an array access like x = A[i] is not a single instruction. The compiler must calculate the effective address using the base address and the element width.

    Formula for 1D Array: Addr(A[i])=Base(A)+(ilow)×widthAddr(A[i]) = Base(A) + (i - low) \times width (Assuming low=0low=0, this simplifies to: Base(A)+i×widthBase(A) + i \times width)

  2. 2
    Step 2

    Consider the assignment: x = a[i] + 10 (Assume a is an array of integers, where each integer is 4 bytes wide)

  3. 3
    Step 3

    First, we calculate the byte offset from the start of the array: t1 = i * 4

    Here, t1 represents how many bytes away from the base address the ii-th element is located.

  4. 4
    Step 4

    We use a special TAC operator for indexed loading: t2 = a[t1]

    Note: In some textbooks, this is written as t2 = contents(base_a + t1).

  5. 5
    Step 5

    Finally, perform the addition and store the result: t3 = t2 + 10 x = t3

  6. 6
    Step 6
    1t1 = i * 4 2t2 = a[t1] 3t3 = t2 + 10 4x = t3

TAC for Control Flow Statements [PYQ 2023]

  1. 1
    Step 1

    High-level control flow constructs (if-else, while, for) must be lowered to TAC using conditional jumps, unconditional jumps, and labels. The key TAC instructions are:

    InstructionMeaning
    ifFalse x goto LIf x is false (0), jump to label L
    if x goto LIf x is true (non-zero), jump to label L
    if x relop y goto LIf x < y (or other relop), jump to L
    goto LUnconditional jump to label L
    L:Label declaration (target of jumps)

    Labels are symbolic names like L1, L2, LEND, LTRUE, LFALSE.

  2. 2
    Step 2

    Structure: Evaluate the condition. Jump to the else part if false. Execute the true part, then jump over the else part.

    Example: Translate if (a < b) x = 10; else x = 20;

    1 ifFalse a < b goto LELSE ← jump to else if condition fails 2 x = 10 ← true branch 3 goto LEND ← skip the else block 4LELSE: 5 x = 20 ← false branch 6LEND:

    Explanation: If a < b is false, control jumps to LELSE where x = 20 is executed. If a < b is true, x = 10 executes, then the unconditional goto LEND skips over the else block.

  3. 3
    Step 3

    Example: if (a < b) x = 10;

    1 ifFalse a < b goto LEND ← jump to end if condition is false 2 x = 10 ← true branch only 3LEND:
  4. 4
    Step 4

    Structure: Place a label at the beginning for looping back. Evaluate the condition before each iteration. Jump out if false.

    Example: Translate while (i < 10) { sum = sum + i; i = i + 1; }

    1LSTART: 2 ifFalse i < 10 goto LEND ← exit loop if condition fails 3 sum = sum + i ← loop body 4 i = i + 1 ← update loop variable 5 goto LSTART ← jump back to condition check 6LEND:

    Explanation: The label LSTART marks the beginning of the loop. ifFalse i < 10 goto LEND exits when i >= 10. The body executes, i increments, and goto LSTART restarts the evaluation.

  5. 5
    Step 5

    Translate: if (x > 0 && y < 10) { z = x + y; } else { z = x - y; }

    Step 1 — Break the boolean condition into TAC:

    1 t1 = x > 0 ← relational comparison → boolean 2 ifFalse t1 goto LSHORT ← short-circuit: if first condition false, skip && 3 t2 = y < 10 4 ifFalse t2 goto LELSE ← second condition: if false, go to else 5 goto LTRUE ← both conditions true 6LSHORT: 7 goto LELSE ← first condition was false, go to else 8LTRUE: 9 z = x + y ← true branch 10 goto LEND 11LELSE: 12 z = x - y ← else branch 13LEND:

TAC for Procedure Calls and Returns

When translating function/procedure calls to TAC, the compiler must handle parameter passing, control transfer, and return value retrieval.

The Standard TAC Instructions for Calls:

InstructionMeaning
param xPush argument x as a parameter (one per argument, in order)
call p, nCall procedure p with n arguments
return yReturn from the current procedure, optionally with value y
x = call p, nCall procedure p, store return value in x

Example: Translating result = factorial(n) to TAC

Source code: result = factorial(n)

1 param n ← push argument n 2 call factorial, 1 ← invoke factorial with 1 parameter 3 result = return_val ← capture the returned value

The value returned by factorial is stored in a special implicit temporary (often called return_val or the return register), which is then assigned to result.

Multi-parameter example:

Source: compute(x, y + z, 5)

1 param x ← arg 1 2 t1 = y + z ← compute arg 2 3 param t1 ← arg 2 4 param 5 ← arg 3 5 call compute, 3 ← call with 3 arguments

Calling Convention: Parameters are evaluated and pushed in left-to-right order. The call instruction transfers control to the procedure's entry label. After the procedure executes return, control resumes at the instruction following call.

Returning a value from a procedure:

1beginfunc factorial: 2 ifFalse param > 1 goto BASE 3 t1 = param - 1 4 param t1 5 call factorial, 1 6 t2 = param * return_val 7 return t2 8BASE: 9 return 1

The Backpatching Advantage

When generating code for boolean expressions (like if (a < b || c < d)), the compiler often doesn't know the jump target yet.

Backpatching is a technique where the compiler leaves the jump address blank in the TAC (e.g., if a < b goto _), maintains a list of these 'open' jumps, and 'patches' them with the correct line number as soon as the target label is reached during parsing. This allows for efficient single-pass IR generation.

Backpatching for Boolean Expressions [PYQ 2019, 2022]

  1. 1
    Step 1

    Consider the expression a < b || c < d && e < f. In a single-pass compiler, when we first see a < b, we don't yet know where to jump if it's true (because || means short-circuit: if true, skip everything). Similarly, we don't know where to jump for the false case until we've finished parsing the entire expression.

    Backpatching solves this by generating incomplete jumps and filling them later.

  2. 2
    Step 2

    The grammar is augmented with marker non-terminals M that generate quads and track lists:

    1B → B₁ || M B₂ { backpatch(B₁.falselist, M.quad); 2 B.truelist = merge(B₁.truelist, B₂.truelist); 3 B.falselist = B₂.falselist; } 4B → B₁ && M B₂ { backpatch(B₁.truelist, M.quad); 5 B.falselist = merge(B₁.falselist, B₂.falselist); 6 B.truelist = B₂.truelist; } 7B → ! B₁ { B.truelist = B₁.falselist; 8 B.falselist = B₁.truelist; } 9B → ( B₁ ) { B.truelist = B₁.truelist; 10 B.falselist = B₁.falselist; } 11B → E₁ rel E₂ { B.truelist = makelist(nextquad); 12 B.falselist = makelist(nextquad + 1); 13 emit('if' E₁ relop E₂ 'goto _'); 14 emit('goto _'); } 15M → ε { M.quad = nextquad; }

    M (marker) captures the current quad index for later patching. nextquad points to the next available instruction slot.

  3. 3
    Step 3

    Three functions drive the algorithm:

    FunctionPurpose
    makelist(i)Creates a new list containing quad index i
    merge(p₁, p₂)Concatenates two lists; returns the merged list
    backpatch(p, i)For every instruction q in list p, fills its target address with i

    Lists are typically implemented as linked lists threaded through the quad array itself.

  4. 4
    Step 4

    Translate a < b || c < d && e < f (Precedence: && binds tighter than ||)

    Initial: nextquad = 100

    Subexpression a < b:

    1100: if a < b goto _ ← truelist: {100} 2101: goto _ ← falselist: {101}

    Subexpression c < d:

    1102: if c < d goto _ ← truelist: {102} 2103: goto _ ← falselist: {103}

    Subexpression e < f:

    1104: if e < f goto _ ← truelist: {104} 2105: goto _ ← falselist: {105}
  5. 5
    Step 5

    When reducing B && M B:

    • B₁ = [c < d], B₂ = [e < f]
    • M.quad = 104 (captured by the ε-production before parsing e < f)
    • backpatch(B₁.truelist {102}, 104) → fills quad 102 with 104
    • B.falselist = merge({103}, {105}) = {103, 105}
    • B.truelist = B₂.truelist = {104}

    Result so far:

    1100: if a < b goto _ ← truelist: {100}, falselist: {101} 2101: goto _ 3102: if c < d goto 104 ← PATCHED! 4103: goto _ ← falselist member 5104: if e < f goto _ ← truelist member 6105: goto _ ← falselist member
  6. 6
    Step 6

    Now reduce a < b || (c < d && e < f):

    • B₁.truelist = {100}, B₁.falselist = {101} (from a < b)
    • M.quad = 102 (from marker before c < d)
    • B₂.truelist = {104}, B₂.falselist = {103, 105} (from the && reduction)
    • backpatch(B₁.falselist {101}, 102) → fills quad 101 with 102
    • B.truelist = merge({100}, {104}) = {100, 104}
    • B.falselist = {103, 105}

    Before backpatching (incomplete):

    1100: if a < b goto _ 2101: goto _ 3102: if c < d goto 104 4103: goto _ 5104: if e < f goto _ 6105: goto _

    After backpatching (only internal jumps filled):

    1100: if a < b goto _ ← jumps to T (true exit) 2101: goto 102 ← on false of a < b, try c < d 3102: if c < d goto 104 ← on true of c < d, check e < f 4103: goto _ ← on false of c < d, whole expr is false 5104: if e < f goto _ ← jumps to T (true exit) 6105: goto _ ← on false of e < f, whole expr is false

    The _ blanks for {100, 104} (truelist) and {103, 105} (falselist) will be backpatched by the parent construct (e.g., the if statement) with the addresses of the true and false code blocks.

Common Questions

Intermediate Representations Beyond TAC

Syntax Trees (ASTs) are the natural output of parsing, preserving the hierarchical structure of expressions and statements. They can be converted directly to TAC via post-order traversal.

DAGs (Directed Acyclic Graphs) are compressed ASTs where identical subtrees are shared—this directly reveals common subexpressions. When converted to TAC, a DAG eliminates redundant computations. Example:

AST for a + a * b:

    +
   / \
  a   *
     / \
    a   b

DAG for a + a * b — the two a leaves are merged:

    +
   / \
  |   *
  |  / \
  a-+   b

TAC from AST (3 instructions): t1 = a * b; t2 = a + t1 TAC from DAG (no redundancy — optimized).

Directed Acyclic Graphs (DAG) for Basic Blocks [PYQ 2022, 2023]

A DAG is a powerful representation for a single basic block that serves two purposes:

  1. It identifies Common Subexpressions (expressions computed more than once).
  2. It identifies values that are computed but never used within the block.

Structure of a DAG:

  • Leaves: Represent the initial values of variables or constants at the start of the block.
  • Internal Nodes: Represent operators.
  • Labels: An internal node can have multiple labels (variable names). If multiple variables point to the same node, it means they all hold the same value (a common subexpression).

Example: a = b + c; d = b + c; Instead of two separate nodes for +, the DAG creates one node for b + c and labels it with both a and d.

Numerical: Constructing a DAG [PYQ 2023]

  1. 1
    Step 1
    1(1) d = b * c 2(2) e = a + b 3(3) b = b * c 4(4) a = e - d
  2. 2
    Step 2

    Create leaf nodes for b and c. Create an internal node * and label it d.

  3. 3
    Step 3

    Create a leaf node for a. Create an internal node + using the existing leaf b and the new leaf a. Label it e.

  4. 4
    Step 4

    The expression b * c was already computed in statement (1). We don't create a new node. We simply add the label b to the existing * node.

    Important: Since b is redefined here, any future uses of b will refer to this node, not the original leaf node.

  5. 5
    Step 5

    Create an internal node -. Connect its left child to node e and its right child to node d. Label it a.

Knowledge Check

Question 1 of 7
Q1Single choice

Which field is present in a Quadruple but missing in a Triple?

Intermediate Code Forms — TAC, Quadruples and Triples | Compiler Design (CD) | Coursify