PYQ Analysis and Exam Preparation

60 mins

Comprehensive analysis of Previous Year Questions (PYQs) for Module 2. Review frequently tested topics, learn strategies for solving massive 14-mark parsing table questions, and master concepts like Handle Pruning and Left Recursion elimination.

Learning Goals

Identify the high-yield topics from Module 2 that consistently appear in university exams.
Understand the exact grading rubrics for 7-mark and 14-mark numerical parsing problems.
Practice drawing complex Syntax Trees using Mermaid diagrams as required by exams.
Write concise, high-scoring short notes on Handle Pruning, YACC, and Operator Precedence.

📊 Exam Weightage & Pattern Analysis

Module 2 (Syntax Analysis) is historically the heaviest and most mathematical module in Compiler Design. You can expect a massive 20 to 25 marks from this module alone. Because top-down and bottom-up parsing algorithms are entirely deterministic, these numericals offer guaranteed full marks if you show the correct traces.

Historical Question Frequency (Module 2)

Analysis of topic appearances across recent university exam cycles.

Bottom-Up Parsing Tables (14 marks) are essentially guaranteed to appear.

Exam Strategy: The 14-Mark Behemoth

Every year features either a 14-mark question or two 7-mark questions asking you to build an SLR(1), CLR(1), or LALR(1) parsing table for a given grammar and trace a string. Prioritize mastering the closure operations and item sets for LR(0) as this forms the foundation for all three.

Part A: 2-Mark Objective Concepts

Part B: Syntax Tree Construction (Standard 3-Mark Question)

1
Step 1
Draw the syntax tree for the following arithmetic expression: a * -(b + c / d)
2
Step 2
To draw the Abstract Syntax Tree (AST) correctly, we must apply standard math precedence:

Parentheses ( ) evaluate first.

Inside parentheses, division / evaluates before addition +.

Unary minus - evaluates before multiplication *.
3
Step 3
Here is the exact tree examiners expect. The root is the final operation evaluated (the *).

Note: In an exam, explicitly label the Unary Minus to distinguish it from binary subtraction.

Target Code Generation from Syntax Tree (7 Marks)

Target code generation for expressions involves translating a syntax tree or DAG (Directed Acyclic Graph) into register-based machine instructions. The key data structures are:

1. Register Descriptor: Tracks which variables are currently in which registers. It tells the code generator what values are available without loading from memory. 2. Address Descriptor: Tracks the memory location(s) where a variable's current value can be found. A variable may be in a register, on the stack, in main memory, or in multiple places simultaneously.

GetReg(I) Algorithm — Selecting a Register for Operation I:

If a register R already holds a value needed by operation I, use R.
If an empty register exists, allocate it.
If no empty register is available, spill a register whose value is either (a) already stored in memory, (b) not needed for the longest time (farthest next use), or (c) loaded using the smallest number of instructions.

Code Generation Function for x = y + z:

1Function GetReg(Instruction x = y op z):
2  1. If y is in register R and R contains ONLY y (address descriptor says y is nowhere else),
3     and y is not live after this instruction: use R for result x.
4  2. Else: find a register R using GetReg(I) heuristic.
5  3. Generate: LD R, y  (if y not already in register)
6  4. Generate: op R, z  (ADD/SUB/MUL/DIV with z)
7  5. Generate: ST x, R  (store result to x if needed elsewhere)
8  6. Update descriptors: Register descriptor marks R as holding x;
9     Address descriptor records x's location(s).

Example — Generating Code for a = b + c * d:

Step	Instruction	Why
1	`LD R1, c`	Load c into free register R1
2	`MUL R1, d`	R1 = c * d
3	`LD R2, b`	Load b into free register R2
4	`ADD R2, R1`	R2 = b + (c * d)
5	`ST a, R2`	Store result to a's memory location

The instruction selection respects operator precedence: multiplication before addition. If the variables are temporary (used only once) and the result is immediately consumed by the next instruction, the STORE instruction can be omitted — the result stays in the register for immediate reuse. This is called register caching.

Part C: 7-Mark & 14-Mark Theoretical & Numerical Strategies

Full SLR(1) Parsing Table Construction (14 Marks)

1
Step 1
E → E + T | T T → T * F | F F → (E) | id

The classic expression grammar that tests operator precedence and associativity in bottom-up parsing.
2
Step 2
Add a new start symbol E' with the production E' → E.

Augmented Grammar:

E' → E

E → E + T | T

T → T * F | F

F → (E) | id
3
Step 3
FIRST Sets:

FIRST(E) = FIRST(T) = FIRST(F) = { ( , id }

FIRST(+) = { + }

FIRST(*) = { ***** }

FOLLOW Sets:

FOLLOW(E') = { $ }

FOLLOW(E) = { $ , + , ) }

FOLLOW(T) = { $ , + , ***** , ) }

FOLLOW(F) = { $ , + , ***** , ) }
4
Step 4
Closure of E' → •E (I₀):

E' → •E

E → •E + T

E → •T

T → •T * F

T → •F

F → •(E)

F → •id

From I₀, transitions create states I₁ through I₁₁.

All 12 States:

I₀: Initial (7 items) — Closure of E' → •E

I₁: E' → E• [Accept]

I₂: E → T•, T → T•*F [S-R in LR(0), resolved in SLR(1)]

I₃: T → F•

I₄: F → (•E), E → •E+T, E → •T, ..., F → •(E), F → •id

I₅: F → id•

I₆: E → E+•T, T → •T*F, T → •F, F → •(E), F → •id

I₇: T → T*•F, F → •(E), F → •id

I₈: F → (E•), E → E•+T

I₉: E → E+T•, T → T•*F

I₁₀: T → T*F•

I₁₁: F → (E)•

Draw the full DFA on your answer sheet showing all GOTO and SHIFT transitions.
5
Step 5
Construction Rules:

If A → α•aβ ∈ Iᵢ and GOTO(Iᵢ, a) = Iⱼ → ACTION[i, a] = sⱼ

If A → α• ∈ Iᵢ and a ∈ FOLLOW(A) → ACTION[i, a] = rⱼ (j = production number)

If E' → E• ∈ I₁ → ACTION[1, $] = accept

SLR(1) Parsing Table:

State id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

Production mapping: (1) E→E+T, (2) E→T, (3) T→TF, (4) T→F, (5) F→(E), (6) F→id*

Conflict Check: No cell has more than one entry → Grammar IS SLR(1). State I₂ resolves the classic LR(0) conflict by restricting r2 to FOLLOW(E) = {$, +, )}, which excludes *.

Step 6

Input: id + id * id$

Step	Stack	Input	Action
0	0	id+id*id$	s5
1	0 id 5	+id*id$	r6 (F→id); GOTO(0,F)=3
2	0 F 3	+id*id$	r4 (T→F); GOTO(0,T)=2
3	0 T 2	+id*id$	r2 (E→T); GOTO(0,E)=1
4	0 E 1	+id*id$	s6
5	0 E 1 + 6	id*id$	s5
6	0 E 1 + 6 id 5	*id$	r6 (F→id); GOTO(6,F)=3
7	0 E 1 + 6 F 3	*id$	r4 (T→F); GOTO(6,T)=9
8	0 E 1 + 6 T 9	*id$	s7
9	0 E 1 + 6 T 9 * 7	id$	s5
10	0 E 1 + 6 T 9 * 7 id 5	$	r6 (F→id); GOTO(7,F)=10
11	0 E 1 + 6 T 9 * 7 F 10	$	r3 (T→T*F); GOTO(6,T)=9
12	0 E 1 + 6 T 9	$	r1 (E→E+T); GOTO(0,E)=1
13	0 E 1	$	accept

Result: String ACCEPTED. The trace correctly handles operator precedence (* over +) and left-associativity.

Check if Grammar is LL(1) (7 Marks)

1
Step 1
S → aBDh B → cC C → bC | ε D → EF E → g | ε F → f | ε

Goal: Compute FIRST/FOLLOW, build the LL(1) table, and check for conflicts.
2
Step 2
Rules for FIRST:

If X is terminal, FIRST(X) = {X}

If X → ε, add ε to FIRST(X)

If X → Y₁Y₂...Yₖ, add FIRST(Y₁)-{ε}; if ε ∈ FIRST(Y₁), continue with Y₂, etc.

Computed FIRST Sets:

FIRST(S) = { a }

FIRST(B) = { c }

FIRST(C) = { b , ε }

FIRST(D) = FIRST(E) ∪ (FIRST(F)-{ε}) = { g , f , ε }

FIRST(E) = { g , ε }

FIRST(F) = { f , ε }

Critical detail for D → EF: FIRST(E) = {g, ε}. Since ε ∈ FIRST(E), we check FIRST(F) = {f, ε}. Therefore FIRST(D) = (FIRST(E)-{ε}) ∪ (FIRST(F)-{ε}) ∪ {ε} = {g} ∪ {f} ∪ {ε} = {g, f, ε}.
3
Step 3
Rules for FOLLOW:

$ ∈ FOLLOW(Start Symbol)

If A → αBβ, add FIRST(β)-{ε} to FOLLOW(B)

If A → αBβ and ε ∈ FIRST(β), add FOLLOW(A) to FOLLOW(B)

If A → αB, add FOLLOW(A) to FOLLOW(B)

Computed FOLLOW Sets:

FOLLOW(S) = { $ }

FOLLOW(B) = FIRST(D)-{ε} ∪ {h} = { g , f , h } (D can be ε → also add h)

FOLLOW(C) = FOLLOW(B) = { g , f , h } (from B → cC, rule 4)

FOLLOW(D) = { h } (from S → aBDh, rule 2)

FOLLOW(E) = FIRST(F)-{ε} ∪ FOLLOW(D) = { f , h } (D → EF; if F→ε, add FOLLOW(D)=h)

FOLLOW(F) = FOLLOW(D) = { h } (from D → EF, rule 4)
4
Step 4
Table Construction Rules: For each production A → α:

For each a ∈ FIRST(α)-{ε}, add A → α to M[A, a]

If ε ∈ FIRST(α), for each b ∈ FOLLOW(A), add A → α to M[A, b]

LL(1) Parsing Table:

NT a b c f g h $
S S→aBDh
B B→cC
C C→bC C→ε C→ε C→ε
D D→EF D→EF D→EF
E E→ε E→g E→ε
F F→f F→ε

Key placement decisions:

C→ε placed in columns f, g, h (FOLLOW(C) = {g, f, h})

E→ε placed in columns f, h (FOLLOW(E) = {f, h})

F→ε placed in column h (FOLLOW(F) = {h})
5
Step 5
Inspect each cell systematically:

Row D:

M[D, g]: D→EF (FIRST(E)={g}). Single ✓

M[D, f]: D→EF (FIRST(F)={f}, E nullable). Single ✓

M[D, h]: D→EF (ε ∈ FIRST(D), h ∈ FOLLOW(D)). Single ✓

Row E:

M[E, g]: E→g (FIRST(g)={g}). Single ✓

M[E, f]: E→ε (f ∈ FOLLOW(E)). Single ✓

M[E, h]: E→ε (h ∈ FOLLOW(E)). Single ✓

Row F:

M[F, f]: F→f (FIRST(f)={f}). Single ✓

M[F, h]: F→ε (h ∈ FOLLOW(F)). Single ✓

Row C:

M[C, b]: C→bC (FIRST(bC)={b}). Single ✓

M[C, f]: C→ε (f ∈ FOLLOW(C)). Single ✓

M[C, g]: C→ε (g ∈ FOLLOW(C)). Single ✓

M[C, h]: C→ε (h ∈ FOLLOW(C)). Single ✓

Every cell has at most ONE entry. No conflicts anywhere.
6
Step 6
The grammar IS LL(1).

No cell in the parsing table has multiple entries, so the parser can deterministically choose the correct production with a single lookahead token.

Key observations for exam answers:

ε-productions in C, E, and F do NOT cause conflicts because FIRST(X) and FOLLOW(X) are disjoint for all nullable non-terminals.

FOLLOW(B) = {g, f, h} correctly propagates from the nullable D prefix.

D → EF: because both E and F can derive ε, FIRST(D) = {g, f, ε} — always trace through all nullable symbols.

Required check for LL(1): For every A → α | β, verify FIRST(α) ∩ FIRST(β) = Ø. For nullable α, verify FIRST(α) ∩ FOLLOW(A) = Ø.

Common 7-Mark Short Note Templates

Knowledge Check

Question 1 of 4

Q1Single choice

In an SLR(1) parsing table, where is the 'accept' action placed?

In state I₀, under the $ column

In the state containing S' → S•, under the $ column

In every state, under the $ column

In the state containing S' → S•, under any FOLLOW(S) column

Final Exam Traps for Module 2

Do NOT guess FIRST/FOLLOW sets. Write down the rules. A single mistake in a FOLLOW set will cause you to place a Reduce action in the wrong column of your parsing table, ruining a 14-mark question.
$\epsilon$ in FIRST, but NEVER in FOLLOW. Remember this golden rule. The end of file marker $ goes in FOLLOW sets, never $\epsilon$ .
LALR(1) Proofs: If a 14-mark question asks you to construct an LALR(1) table and prove it is not SLR(1), you must calculate the exact LR(1) lookaheads for the items. You will find that SLR(1) attempts to place a Reduce action based on a global FOLLOW set that causes a conflict, but the precise LR(1) lookaheads avoid that conflict.

Ambiguity, Operator Precedence, and YACC

Attribute Grammars and Syntax-Directed Definitions

State	id	+	*	(	)	$	E	T	F
0	s5			s4			1	2	3
1		s6				acc
2		r2	s7		r2	r2
3		r4	r4		r4	r4
4	s5			s4			8	2	3
5		r6	r6		r6	r6
6	s5			s4				9	3
7	s5			s4					10
8		s6			s11
9		r1	s7		r1	r1
10		r3	r3		r3	r3
11		r5	r5		r5	r5

NT	a	b	c	f	g	h
S	S→aBDh
B			B→cC
C		C→bC		C→ε	C→ε	C→ε
D				D→EF	D→EF	D→EF
E				E→ε	E→g	E→ε
F				F→f		F→ε

State	id	+	*	(	)	$	E	T	F
0	s5			s4			1	2	3
1		s6				acc
2		r2	s7		r2	r2
3		r4	r4		r4	r4
4	s5			s4			8	2	3
5		r6	r6		r6	r6
6	s5			s4				9	3
7	s5			s4					10
8		s6			s11
9		r1	s7		r1	r1
10		r3	r3		r3	r3
11		r5	r5		r5	r5

State	id	+	*	(	)	$	E	T	F
0	s5			s4			1	2	3
1		s6				acc
2		r2	s7		r2	r2
3		r4	r4		r4	r4
4	s5			s4			8	2	3
5		r6	r6		r6	r6
6	s5			s4				9	3
7	s5			s4					10
8		s6			s11
9		r1	s7		r1	r1
10		r3	r3		r3	r3
11		r5	r5		r5	r5