Structured Analysis: DFD and Data Dictionary
This section introduces structured analysis artifacts used to model information flow and data definitions in software systems, emphasizing clarity, consistency, and traceability from requirements to design.
Learning Goals
- Construct context-level and leveled DFDs that correctly represent processes, data stores, external entities, and data flows.
- Differentiate among DFD elements and validate whether a diagram follows balancing and decomposition rules.
- Develop a data dictionary that defines data elements, composite structures, aliases, and data flow contents precisely.
- Trace information from DFD processes and flows to corresponding entries in a data dictionary.
- Evaluate the consistency and completeness of structured analysis models for a given software problem.
Structured Analysis is a top-down modeling discipline that describes what a system does by tracing how data enters, transforms, is stored, and leaves the system. Two artifacts anchor this approach. The Data Flow Diagram (DFD) captures the movement and transformation of information, while the Data Dictionary captures the precise composition and meaning of every data item the diagram references.
The power of structured analysis lies in the disciplined coupling of these two artifacts. A DFD names data flows like CUSTOMER_INVOICE, but it deliberately stays silent on what fields that invoice contains. The data dictionary supplies that definition, establishing a single source of truth so that every analyst, developer, and stakeholder shares one consistent vocabulary. This separation of flow from content keeps diagrams readable while preserving full traceability from requirements through to design.
DFDs are built hierarchically: a single context diagram expands into progressively finer levels, and each refinement must remain consistent with the level above it. The dictionary grows in parallel, so that the moment a flow appears on a diagram, its structure becomes definable in the dictionary.
Footnotes
-
Why Is A Data Dictionary Important In Structured Analysis - Role of the dictionary as a shared catalogue of all system data elements. ↩
-
Analyzing Systems Using Data Dictionaries - Dictionary entries, aliases, ranges, and their use across analysis and design. ↩
-
Creating the Data Dictionary - Top-down development of dictionary entries alongside the DFD. ↩
Dataflow Diagram, Entity Relationship Diagram, Data Dictionary
A DFD is built from exactly four element types. Confusing them is the most common modeling error, so each has a distinct meaning and a strict set of connection rules.
| Element | Meaning | Notation (Yourdon/DeMarco) |
|---|---|---|
| Process | Transforms data | Circle / bubble |
| Data Store | Holds data at rest | Open-ended rectangle / two parallel lines |
| External Entity | Source or sink | Square |
| Data Flow | Moves data between elements | Labelled arrow |
The connection rules follow from these meanings. A data flow must touch at least one process; flows never run directly from one external entity to another, nor directly from one data store to another, because nothing would transform the data in between. The single most important diagnostic is the process integrity rule: a process whose outputs cannot be produced from its inputs (a "miracle"), or whose inputs are never used (a "black hole"), signals a modeling defect.
Constructing a Leveled DFD
- 1Step 1
Represent the entire system as one single process numbered 0, surrounded only by external entities and the net data flows crossing the system boundary. No data stores appear here. This establishes scope and what the system touches.
- 2Step 2
Break the single process into 3 to 7 major sub-processes numbered 1.0, 2.0, 3.0 and so on. Now internal data stores become visible, along with the flows connecting the sub-processes.
- 3Step 3
Expand any sub-process that remains too complex into its own child diagram. A process numbered 4.0 expands into 4.1, 4.2, 4.3. Stop decomposing once a process performs a single primitive function; that is a primitive DFD.
- 4Step 4
Verify that the net inputs and outputs of a parent process exactly match the net inputs and outputs of its child diagram. Any mismatch must be resolved before proceeding deeper.
- 5Step 5
As soon as a flow is named on any diagram, create its dictionary entry so that no flow remains an undefined label.
Always start at the boundary
Begin with the context diagram before any internal detail. Fixing the system boundary and its external entities first prevents scope creep and keeps every lower level anchored to what the system is actually responsible for.
Two rules govern whether a decomposition is valid: balancing and decomposition consistency. Balancing requires that the data flows entering and leaving a parent process are conserved when that process is exploded into a child diagram. If process 4.0 receives two inputs and emits one output on the parent, its child Diagram 4 must show those same two inputs entering and that same one output leaving.
A subtlety worth internalizing: a single composite flow on a parent may legitimately split into its component flows on the child, and the diagram is still balanced. If EMPLOYEE_RECORD enters parent process 5.0, the child Diagram 5 may show WAGE_INFORMATION and PERSONAL_DETAILS entering separately, provided the dictionary defines EMPLOYEE_RECORD as the sum of exactly those parts. Balancing is therefore checked against the dictionary, not against labels alone.
The amount of detail roughly multiplies by a factor at each level, and decomposition rarely needs to go beyond Level 2 or 3 in practice.
Footnotes
-
Data Flow Diagram Balancing - Definition of balancing and its role in consistency between levels. ↩
-
Data Flow Diagrams (DFD) lecture notes - Formal balancing condition on inputs and outputs across levels. ↩
-
Creating the Data Dictionary - How composite flows decompose into component flows on child diagrams. ↩
-
What is a Data Flow Diagram - DFD level hierarchy and the rarity of going beyond Level 3. ↩
Unbalanced diagrams break traceability
If a child diagram introduces an input or output that does not appear on its parent process, the model is unbalanced. This is not a cosmetic issue: it means a flow exists that no higher level accounts for, severing the traceability chain back to requirements.
Distinguishing Elements and Catching Rule Violations
1= consists of / is composed of 2+ and (sequence, concatenation) 3[ ] either-or (selection of one alternative) 4| separates alternatives inside [ ] 5{ } iteration (repeated element or group) 6( ) optional element (may be absent) 7* * enclosed text is a comment 8" " a literal value
The data dictionary uses an algebraic notation to define data precisely. Three composition structures cover every case: sequence (components in fixed order, joined with +), selection (exactly one alternative, written with [ ] and |), and iteration (a repeated element, written with { }). Bounds may be attached to iteration, so means between one and ten order lines inclusive.
Every entry records more than structure. A complete element definition carries an alias list, allowable values or ranges, length, and an English description. Aliases matter because the sales team's CLIENT_ID and finance's ACCOUNT_NUMBER may denote the same element; recording the alias prevents the system from modeling one real-world concept as two.
| Dictionary entry kind | What it defines | Example |
|---|---|---|
| Data flow | Composition of a flow on the DFD | INVOICE = INVOICE_NO + DATE + 1{LINE_ITEM} |
| Data store | Composition of a repository | ORDERS_FILE = {CUSTOMER_ORDER} |
| Composite element | A structure built from other elements | ADDRESS = STREET + CITY + ZIP |
| Primitive element | An atomic, indivisible item | ZIP = * 6 numeric digits * |
A primitive element cannot be decomposed further; a composite is defined entirely in terms of other dictionary entries, which lets the analyst build definitions top-down in lockstep with the diagram.
Footnotes
-
Data Dictionary Notation - Composite data via sequence, repetition, and selection. ↩
-
Chapter 8 - Analyzing Systems Using Data Dictionaries - Algebraic notation symbols and element attributes including aliases. ↩
Tracing Information from DFD to Data Dictionary
Spot a flow on the diagram
Step 1On Diagram 0 you find a flow labelled CUSTOMER_ORDER entering process 2.0."
Locate its dictionary entry
Step 2The dictionary defines CUSTOMER_ORDER as a sequence of CUSTOMER_ID, repeated ORDER_LINE, and PAYMENT_TYPE."
Follow composites down to primitives
Step 3ORDER_LINE resolves to PRODUCT_CODE plus QUANTITY, each defined as a primitive with a type and range. Every component is now grounded."
Check balance against the child diagram
Step 4When process 2.0 explodes into Diagram 2, confirm the component flows there sum back to CUSTOMER_ORDER, closing the traceability loop."
Consistency is a two-way obligation
Completeness means every diagram flow has a dictionary entry; consistency means every dictionary entry traces to a real flow or store. A model passes review only when both directions hold, with no orphaned labels and no defined items that never appear.
Knowledge Check
On a context (Level 0) diagram, which DFD element is intentionally absent?