Bottom-Up Parsing: LR(0), SLR(1), LR(1), LALR(1)

Shift your perspective from Top-Down to Bottom-Up. Learn how Shift-Reduce parsers work, how to identify parsing conflicts, and understand the hierarchy of LR parsing algorithms used in modern compilers.

Learning Goals

Explain the core mechanism of Shift-Reduce parsing (Shift, Reduce, Accept, Error).
Identify and differentiate between Shift-Reduce and Reduce-Reduce conflicts.
Construct the Canonical Collection of LR(0) items and build a parsing state machine (DFA).
Describe the hierarchical relationship between LR(0), SLR(1), LALR(1), and LR(1) parsers.
Explain how SLR(1) uses FOLLOW sets to resolve conflicts present in LR(0) parsing tables.

Introduction to Bottom-Up Parsing

While Top-Down (LL) parsing starts at the root and attempts to derive the input string, Bottom-Up Parsing starts with the input string (the leaves) and attempts to compress it backwards into the Start Symbol (the root).

This process is mathematically equivalent to tracing a Rightmost Derivation in reverse—a core property frequently evaluated in advanced compiler design assessments. The most powerful class of bottom-up parsers are the LR Parsers.

What does LR(1) stand for?

L: Scans the input from Left to right.
R: Produces a Rightmost derivation (in reverse).
(1): Uses exactly 1 token of lookahead.

Handle Pruning: A 'handle' is a substring that matches the right side of a production rule and represents one step backwards in a rightmost derivation. The entire basis of bottom-up parsing is Handle Pruning—finding this handle on the stack and 'pruning' (reducing) it to its left-hand side non-terminal.

LR parsers are strictly more powerful than LL parsers. Almost all modern production compilers (and parser generators like Yacc/Bison) use LR parsing because it can handle a much larger class of grammars (including left-recursive ones) without requiring major structural rewrites.

An LR Parser maintains a Stack (which stores states and grammar symbols) and an Input Buffer. At any step, it looks up the top state on the stack and the current input token in its parsing table. It will perform one of four actions:

Shift ( $s_n$ ): Push the current input token onto the stack, and transition to a new state $n$ . Advance the input pointer.
Reduce ( $r_k$ ): The parser has found a complete right-hand side (a 'handle') matching grammar rule $k$ ( $A \to \beta$ ). It pops $\beta$ off the stack, and pushes the left-hand side $A$ .
Accept: The parser has successfully reduced the entire input back to the Start Symbol and the input is empty.
Error: The table entry is blank, meaning a syntax error was found in the input.

Parsing Conflicts: When the Parser gets confused

An LR parser is deterministic. If a cell in the parsing table contains more than one action, the grammar cannot be parsed by that specific algorithm. There are two types of conflicts:

Shift-Reduce Conflict: The parser sees a valid handle it could reduce, but it also sees a valid terminal it could shift. It doesn't know whether to reduce now, or keep shifting to find a larger handle. (This often happens with the classic 'dangling-else' problem).
Reduce-Reduce Conflict: The parser sees a handle on the stack that matches the right-hand side of two different grammar rules. It doesn't know which rule to reduce by.

Building the Foundation: LR(0) Items

1
Step 1
An LR(0) item is simply a grammar production with a dot ( $\bullet$ ) inserted somewhere in the right side. The dot indicates how much of a rule the parser has seen so far.

For the rule $A \to XYZ$ , the possible items are:

$A \to \bullet XYZ$ (Haven't seen anything yet)

$A \to X \bullet YZ$ (Seen $X$ , expecting $Y$ next)

$A \to XY \bullet Z$ (Seen $XY$ , expecting $Z$ next)

$A \to XYZ \bullet$ (Seen everything — ready to Reduce!)
2
Step 2
Before we start, we add a brand new start symbol $S'$ to the grammar, with a single rule pointing to the old start symbol $S$ :

$S' \to S$

This is called the Augmented Grammar. Its sole purpose is to give the parser exactly one clear state where it knows it should halt and Accept (when it reaches $S' \to S \bullet$ ).
3
Step 3
If there is a dot immediately before a non-terminal, it means we are expecting to see that non-terminal. Therefore, we must add all the rules for that non-terminal to our current state (with the dot at the beginning).

Example: If state $I_0$ contains: $E \to \bullet T E'$ Since the dot is before $T$ , we must add all $T$ rules to $I_0$ : $T \to \bullet F T'$ Now the dot is before $F$ , so we must add all $F$ rules: $F \to \bullet ( E )$ $F \to \bullet id$
4
Step 4
The $\text{GOTO}(I, X)$ function computes the new state the parser enters when it is in state $I$ and sees grammar symbol $X$ (which can be a terminal or non-terminal).

Algorithm:

Look at all items in $I$ where the dot is right before $X$ (e.g., $A \to \alpha \bullet X \beta$ ).

Move the dot past $X$ : $A \to \alpha X \bullet \beta$ .

Create a new state with these items, and apply the $\text{CLOSURE}()$ operation to expand it.
5
Step 5
To build a CLR(1) or LALR(1) table, we must compute LR(1) Items. An LR(1) item looks like this: $[A \to \alpha \bullet \beta, a]$ , where $a$ is the strict lookahead terminal.

The Golden Rule for LR(1) Closures: If you have an item $[A \to \alpha \bullet B \beta, a]$ in your state, you must add the closure for $B$ . The lookahead for the new $B$ items will be exactly $\text{FIRST}(\beta a)$ .

Example: Suppose State $I_0$ contains: $[S \to \bullet CC, \char36]$

The dot is before $C$ . We must add $C \to \bullet cC$ and $C \to \bullet d$ .

What is the lookahead? Here, $\alpha$ is empty, $B$ is $C$ , $\beta$ is $C$ , and $a$ is $\char36$ .

Lookahead = $\text{FIRST}(C\char36)$ . Since $C$ starts with 'c' or 'd', $\text{FIRST}(C\char36)$ is $\{c, d\}$ .

So we add: $[C \to \bullet cC, c/d]$ and $[C \to \bullet d, c/d]$ .

Constructing a Canonical LR(1) Parsing Table

1
Step 1
Start with the augmented grammar by adding $S' \to S$ . This ensures exactly one accepting state. We'll trace the classic grammar:

$S' \to S$ $S \to CC$ $C \to cC \mid d$

Step 0: Begin with the initial item $[S' \to \bullet S, \char36]$ and apply CLOSURE.
2
Step 2
The LR(1) Closure Rule: If $[A \to \alpha \bullet B \beta, a]$ is in the state, add all $B$ productions as $[B \to \bullet \gamma, b]$ where $b \in \text{FIRST}(\beta a)$ .

Start: $[S' \to \bullet S, \char36]$

Dot is before $S$ (non-terminal), so add $S$ rules with lookahead = $\text{FIRST}(\varepsilon \cdot \char36) = \{\char36\}$ → Add $[S \to \bullet CC, \char36]$

Now dot is before $C$ . Lookahead = $\text{FIRST}(C\char36)$ . Since $C$ derives c or d, $\text{FIRST}(C\char36) = \{c, d\}$ . → Add $[C \to \bullet cC, c/d]$ and $[C \to \bullet d, c/d]$

State $I_0$ = $\{[S' \to \bullet S, \char36],\ [S \to \bullet CC, \char36],\ [C \to \bullet cC, c/d],\ [C \to \bullet d, c/d]\}$
3
Step 3
For each state, compute $\text{GOTO}$ on every grammar symbol that appears after a dot. Apply $\text{CLOSURE}$ after moving the dot.

GOTO( $I_0$ , $S$ ) = $I_1$ : $\{[S' \to S \bullet, \char36]\}$ (No closure needed — dot at end)

GOTO( $I_0$ , $C$ ) = $I_2$ : Move dot past $C$ in $[S \to \bullet CC, \char36]$ → $[S \to C \bullet C, \char36]$ . Apply closure on $C$ with lookahead $\text{FIRST}(\char36) = \{\char36\}$ : $I_2 = \{[S \to C \bullet C, \char36],\ [C \to \bullet cC, \char36],\ [C \to \bullet d, \char36]\}$

GOTO( $I_0$ , $c$ ) = $I_3$ : Move dot past c in $[C \to \bullet cC, c/d]$ → $[C \to c \bullet C, c/d]$ . Apply closure on $C$ with lookahead $\text{FIRST}(c/d) = \{c, d\}$ : $I_3 = \{[C \to c \bullet C, c/d],\ [C \to \bullet cC, c/d],\ [C \to \bullet d, c/d]\}$

GOTO( $I_0$ , $d$ ) = $I_4$ : $\{[C \to d \bullet, c/d]\}$ (Reduce state — dot at end)
4
Step 4
Continue computing $\text{GOTO}$ for every state-symbol pair until no new states emerge.

GOTO( $I_2$ , $C$ ) = $I_5$ : $\{[S \to CC \bullet, \char36]\}$ (Reduce state)

GOTO( $I_2$ , $c$ ) = $I_6$ : $\{[C \to c \bullet C, \char36],\ [C \to \bullet cC, \char36],\ [C \to \bullet d, \char36]\}$

GOTO( $I_2$ , $d$ ) = $I_7$ : $\{[C \to d \bullet, \char36]\}$

GOTO( $I_3$ , $C$ ) = $I_8$ : $\{[C \to cC \bullet, c/d]\}$

GOTO( $I_3$ , $c$ ) = $I_3$ (loop on c)

GOTO( $I_3$ , $d$ ) = $I_4$ (existing state)

GOTO( $I_6$ , $C$ ) = $I_9$ : $\{[C \to cC \bullet, \char36]\}$

GOTO( $I_6$ , $c$ ) = $I_6$ (loop), GOTO( $I_6$ , $d$ ) = $I_7$ (existing)

Total: 10 distinct LR(1) states ( $I_0$ through $I_9$ ). Contrast this with only 7 LR(0) states for the same grammar—the lookahead information splits states apart.

Step 5

The final LR(1) parsing table for grammar $S \to CC,\ C \to cC \mid d$ :

State	c	d	\char36	S	C
$I_0$	$s_3$	$s_4$	—	1	2
$I_1$	—	—	acc	—	—
$I_2$	$s_6$	$s_7$	—	—	5
$I_3$	$s_3$	$s_4$	—	—	8
$I_4$	$r_4$	$r_4$	—	—	—
$I_5$	—	—	$r_2$	—	—
$I_6$	$s_6$	$s_7$	—	—	9
$I_7$	—	—	$r_4$	—	—
$I_8$	$r_3$	$r_3$	—	—	—
$I_9$	—	—	$r_3$	—	—

Key observation: State $I_4$ (reduce $C \to d$ ) has lookahead {c, d}, while $I_7$ (reduce $C \to d$ ) has lookahead {\char36}. LR(1) splits what LR(0) would merge into a single state.

Operator Precedence Parsing is a classical shift-reduce parsing technique designed specifically for operator grammars—grammars where no production has two adjacent non-terminals.

Instead of building a full DFA of states, operator precedence parsers define three precedence relations between every pair of terminal symbols:

$a \lessdot b$ (yields precedence): Terminal $a$ has lower precedence than $b$ . When $a$ is on top of stack and $b$ is next in input, the parser must shift $b$ .
$a \doteq b$ (equal precedence): $a$ and $b$ have the same precedence level. The parser continues shifting.
$a \gtrdot b$ (takes precedence): Terminal $a$ has higher precedence than $b$ . When $a$ is on top of stack and $b$ is next, the parser must reduce the handle ending with $a$ before $b$ can be processed.

The $\char36$ marker: The end-of-input marker $\char36$ has the lowest possible precedence: $\char36 \lessdot a$ and $a \gtrdot \char36$ for any terminal $a$ .

Comparing State Counts Across LR Variants

For a given grammar, the number of states in the canonical collection varies dramatically depending on the LR variant. Consider our running grammar $S \to CC,\ C \to cC \mid d$ :

LR Variant	Number of States	Reason
LR(0)	7	Items track only core — no lookahead information.
SLR(1)	7	Uses LR(0) state machine; only $\text{FOLLOW}$ sets differ in the table.
LALR(1)	7	Merges LR(1) states with identical cores back into fewer states.
Canonical LR(1)	10	Each state is split by distinct lookahead tokens.

For real programming languages, the state explosion is much more severe. A grammar producing ~300 SLR(1) states can easily generate 3,000–10,000 canonical LR(1) states.

Why LALR(1) is the Industry Sweet Spot

LALR(1) achieves the ideal engineering compromise:

Same table size as SLR(1): By merging LR(1) states with identical cores, the state machine is heavily compressed.
High Power: The lookaheads per merged state are the union of the lookaheads from the constituent LR(1) states. This precise context resolves the vast majority of conflicts that SLR(1) cannot.
The Caveat: Merging can theoretically introduce new Reduce-Reduce conflicts (if two non-overlapping lookaheads merge and overlap). However, Shift-Reduce conflicts are never introduced by merging.
Yacc and Bison default to LALR(1) because it fits cleanly in memory while handling virtually all practical language constructs.

The Power Hierarchy of LR Parsers

Not all LR parsers are created equal. As we attempt to resolve conflicts by integrating more specific lookahead data, the algorithms grow mathematically stronger:

The relationship forms a strict subset hierarchy: $LR(0) \subset SLR(1) \subset LALR(1) \subset LR(1)$ . (If a grammar is LR(0), it is automatically SLR(1), but not vice versa).

Power: Weakest. Table Size: Small.

How it reduces: If state $I$ contains a completed item $A \to \alpha \bullet$ , LR(0) blindly puts the reduce action $r_A$ in every single column of that state's row in the parsing table.

The Problem: It doesn't look ahead at the next input token. It just assumes if a rule is complete, it must reduce. This naturally creates massive numbers of Shift-Reduce conflicts.

Common Questions

Knowledge Check

Question 1 of 7

Q1Single choice

Which of the following describes a Shift-Reduce conflict?

The parser has two different rules it can use to reduce the stack.

The parser cannot decide whether to push the input token to the stack or reduce the current stack handle.

The parser encounters an unrecognized character.

The parser attempts to pop from an empty stack.

Heuristics for Exam Classification

How to quickly determine which LR class a grammar belongs to:

Build the LR(0) automaton first. This DFA forms the backbone of ALL LR parsers.
Check LR(0): If filling the table (placing 'reduce' in ALL columns for completed states) yields any cell with multiple actions → the grammar is NOT LR(0).
Check SLR(1): Recompute reduce logic using FOLLOW sets. If conflicts vanish → the grammar IS SLR(1). If conflicts persist → it's NOT SLR(1).
Check LR(1): Build the full canonical LR(1) DFA. If conflicts exist here → the grammar is fundamentally ambiguous.
Check LALR(1): If LR(1) has no conflicts, merge states with identical cores. If the compressed table has no Reduce-Reduce conflicts → the grammar IS LALR(1).

Golden Rule regarding DFA states: $\text{States in LR(0)} = \text{States in SLR(1)} = \text{States in LALR(1)} < \text{States in LR(1)}$

Stanford CS143: Bottom-Up Parsing Slides

web

Top-Down Parsing and LL(1) Grammars

Ambiguity, Operator Precedence, and YACC