Type Systems and Data Abstraction
Explore the compiler's role in enforcing type safety. Understand how type systems are formally defined, how type checking and type inference work, and how abstract data types present new compilation challenges.
Learning Goals
- Define a type system and explain its role in preventing runtime errors.
- Distinguish between strong and weak typing, and static vs. dynamic type checking.
- Describe how a compiler performs type coercion (implicit type conversion) and type casting.
- Explain polymorphism (overloading vs. parametric) and its compilation implications.
- Describe the compilation of abstract data types and encapsulation.
What Is a Type System?
A type system is a collection of rules that assigns a type to every expression in a program. Types classify values — int, float, bool, string, pointer types, and user-defined types — and the type system defines which operations are legal on values of each type.
Why does a compiler need a type system? Consider this code:
1int x = "hello" + 3.14;
Without a type system, the compiler would blindly attempt to add a string and a floating-point integer. The type system catches this at compile time, before the program ever runs.
Formally, a type system consists of:
- A set of base types (e.g.,
int,float,char) - Type constructors to build composite types (e.g.,
array,pointer,struct,function) - Type rules that assign types to every syntactic construct
- Type relations — type compatibility and conversion rules
A program that satisfies all type rules is called well-typed. A well-typed program cannot produce certain classes of runtime errors — this is the fundamental guarantee a type system provides.
Static vs. Dynamic Type Checking
The most important axis along which type systems vary is when type checking occurs:
| Feature | Static Type Checking | Dynamic Type Checking |
|---|---|---|
| When | At compile time | At runtime |
| Languages | C, C++, Java, Rust | Python, Ruby, JavaScript |
| Error detection | Errors caught before execution | Errors manifest during execution |
| Performance | No runtime type checks needed | Overhead per operation |
| Flexibility | Less (must declare types) | More (types determined at runtime) |
A language can use both: Java statically types the vast majority of code but uses dynamic checks for downcasts ((Subclass) obj).
Strong vs. Weak typing is a different axis — it concerns how strictly types are enforced at runtime:
| Feature | Strong Typing | Weak Typing |
|---|---|---|
| Implicit conversions | Few or none | Many and permissive |
| Example | Python refuses "hello" + 3 | C allows char* ptr = 3.14; (dangerous) |
| Safety | Higher | Lower |
C is statically typed but weakly typed — the compiler knows the types but allows dangerous implicit conversions. Python is dynamically typed but strongly typed — types are checked at runtime, but the language refuses nonsensical operations.
Type Coercion vs. Type Casting
When a programmer writes float x = 3 + 4.5;, the integer 3 must be converted to float before the addition. The compiler handles this automatically — this is the coercion vs. casting distinction.
Type Coercion (Implicit Conversion): The compiler silently converts a value from one type to another when the types in an expression don't match. Rules are built into the language specification.
int a = 5; float b = a + 2.7; // a is coerced from int to float before addition
The coercion hierarchy in most languages:
char → short → int → long → float → double → long double
This is called widening conversion — it never loses precision and is always safe.
Narrowing conversions (e.g., double → int) may lose information and typically require an explicit cast:
Type Casting (Explicit Conversion): The programmer explicitly instructs the compiler to convert a type:
1int total = 10, count = 3; 2float avg = (float) total / count; // Explicitly cast total to float 3 // Without the cast: integer division = 3 4 // With the cast: floating-point division = 3.33
In the compiler, inserted coercion operations become explicit conversion nodes in the annotated AST — just as the semantic analyzer adds type annotations, it also inserts coercion nodes where the language rules require them.
Type Inference
Some languages require every variable to have an explicit type annotation. Others can infer types from context. This is type inference:
1-- Haskell: compiler infers that x is Int and add takes Int → Int → Int 2add x y = x + y
1// C++ (C++11+): auto keyword triggers type inference 2auto result = computeValue(); // Compiler deduces the type from the return type
Hindley-Milner type inference is the classical algorithm used in functional languages (Haskell, ML). It works by:
- Generating type constraints from the program structure
- Solving the constraints using unification — the same algorithm used in Prolog
- Checking that the solution is consistent (no contradictions)
From a compiler perspective, type inference is essentially an extended form of type checking: instead of verifying that annotations are correct, the compiler synthesizes the annotations from how the value is used.
Ad-hoc polymorphism means the same function name behaves differently depending on its argument types.
1int add(int a, int b); // Version 1 2float add(float a, float b); // Version 2
How the compiler resolves overloading (name mangling): At compile time, the compiler performs name mangling — it encodes the parameter types into the function's internal name:
add(int, int)→_Z3addiiadd(float, float)→_Z3addff
The linker sees these as completely separate functions. The correct version is selected at compile time based on the static types of the arguments.
This is resolved entirely at compile time — zero runtime cost.
Abstract Data Types and Encapsulation
An Abstract Data Type (ADT) separates the interface (what operations are available) from the implementation (how data is stored). Classes in C++/Java/Absolute are the primary mechanism:
1class Stack { 2private: 3 int data[100]; // Hidden implementation 4 int top; 5public: 6 void push(int x); // Public interface 7 int pop(); 8};
How the compiler enforces encapsulation:
- Access specifiers (
private,protected,public) are enforced entirely at compile time. The compiler rejects any external code that accessesdata[]ortopdirectly. - Name mangling for member functions encodes the class name, enabling overloaded methods across different classes.
- The
thispointer — every non-static member callobj.push(5)is transformed by the compiler intoStack::push(&obj, 5). The hiddenthisparameter gives the method access to the specific instance's private data.
From the compiler's perspective, ADTs add:
- A scope layer for member names (class scope nested inside namespace scope)
- Access checking as part of name resolution in the semantic analyzer
- Implicit
thisparameter in every member function call
The private keyword generates no runtime protection — private is purely a compile-time contract. Once the code is compiled, all fields are equally accessible in memory.
Common Exam Traps
-
Coercion ≠ Casting: Coercion is implicit (the compiler does it silently). Casting is explicit (the programmer writes it). If a question asks about implicit conversion, the answer is coercion.
-
Overloading ≠ Overriding: Overloading is compile-time (same name, different parameters, same class). Overriding is runtime (subclass redefines a virtual method from the base class). Overloading is name mangling; overriding is vtable dispatch.
-
Static vs. Dynamic typing ≠ Strong vs. Weak typing: These are two independent axes. C is static but weak. Python is dynamic but strong. Java is static and strong. Know the difference.
-
Type inference skips annotations but not checking: Even in languages with full type inference, the compiler still checks all types — it just doesn't require you to write them out.
-
privateis compile-time only: There is no hardware or runtime enforcement of private members. It is purely a semantic analysis check.
Knowledge Check
What happens when a compiler encounters int x = 3.7 + 2; in a statically-typed language?