Data Flow Analysis
The mathematical foundation of global optimization. Learn to construct Control Flow Graphs from basic blocks and apply iterative data flow equations to determine reaching definitions and live variable information across an entire program.
Learning Goals
- Partition a TAC sequence into basic blocks and construct a Control Flow Graph (CFG).
- Define reaching definitions and set up the Gen/Kill data flow equations.
- Solve reaching definitions analysis iteratively using the worklist algorithm.
- Define liveness analysis and set up the Use/Def data flow equations.
- Solve live variable analysis and use the results to identify dead code.
Introduction to Data Flow Analysis
Data Flow Analysis (DFA) is the mathematical technique used by compilers to gather information about how data values are calculated and used throughout a program. This information is essential for Global Optimization (optimizing across multiple basic blocks).
The analyzer represents the program as a Control Flow Graph (CFG) and propagates information through the graph using Data Flow Equations.
Basic Blocks and Leaders
Before we can build a CFG, we must partition the Three-Address Code into Basic Blocks. A basic block is a sequence of instructions where:
- Flow of control enters only at the first instruction.
- Flow of control leaves only at the last instruction (no branching in the middle).
Rules to find Leaders (The first instruction of a block):
- The first statement of the entire program is a leader.
- Any statement that is the target of a conditional or unconditional jump is a leader.
- Any statement that immediately follows a conditional or unconditional jump is a leader.
Trace: Partitioning Basic Blocks
- 1Step 1
Consider this 10-line segment of Three-Address Code:
1(1) i = 1 2(2) j = 1 3(3) t1 = 10 * i 4(4) t2 = t1 + j 5(5) a[t2] = 0 6(6) j = j + 1 7(7) if j <= 10 goto (3) 8(8) i = i + 1 9(9) if i <= 10 goto (2) 10(10) halt - 2Step 2
Applying the leader rules:
- Rule 1: Line (1) is a leader.
- Rule 2 (Targets): (3) is target of
goto(3), (2) is target ofgoto(2). - Rule 3 (Successors): (8) follows jump at (7), (10) follows jump at (9).
Final Leaders: (1), (2), (3), (8), (10).
- 3Step 3
Block B1: (1) Block B2: (2) Block B3: (3)-(7) Block B4: (8)-(9) Block B5: (10)
- 4Step 4
The Reaching Definitions Framework
A definition of a variable x (e.g., ) reaches a point if there is a path from to such that is not redefined along that path.
The Data Flow Equations: For each block :
| Term | Meaning |
|---|---|
| GEN[B] | Definitions generated within that reach the end of . |
| KILL[B] | Definitions outside of that define variables also defined within . |
Numerical: Solving Reaching Definitions
- 1Step 1
First, label every assignment with a unique ID (). Then calculate the static GEN and KILL sets for every block.
- 2Step 2
Set for all blocks. In Reaching Definitions, we perform a Forward-Any (Union) analysis.
- 3Step 3
Apply the equations and repeatedly through all blocks. In each pass, the sets grow until they reach a fixed point where no more changes occur.
- 4Step 4
If a variable has only one definition reaching a point, and is a constant, the compiler can perform Constant Propagation.
| Property | Details |
|---|---|
| Direction | Forward |
| Operator | Union () |
| Purpose | Constant/Copy Propagation |
| Initialization |
Definitions 'reach' if at least one path exists where the variable is not redefined.
Union vs. Intersection
Union () is used for 'May-Analysis' (e.g., a definition might reach).
Intersection () is used for 'Must-Analysis' (e.g., an expression must be available on all paths).
Frequently Asked Questions
Knowledge Check
Which of the following instructions always marks the beginning of a new Basic Block?