Type Systems and Data Abstraction

35 mins

Explore the compiler's role in enforcing type safety. Understand how type systems are formally defined, how type checking and type inference work, and how abstract data types present new compilation challenges.

Learning Goals

Define a type system and explain its role in preventing runtime errors.
Distinguish between strong and weak typing, and static vs. dynamic type checking.
Describe how a compiler performs type coercion (implicit type conversion) and type casting.
Explain polymorphism (overloading vs. parametric) and its compilation implications.
Describe the compilation of abstract data types and encapsulation.

What Is a Type System?

A type system is a collection of rules that assigns a type to every expression in a program. Types classify values — int, float, bool, string, pointer types, and user-defined types — and the type system defines which operations are legal on values of each type.

Why does a compiler need a type system? Consider this code:

1int x = "hello" + 3.14;

Without a type system, the compiler would blindly attempt to add a string and a floating-point integer. The type system catches this at compile time, before the program ever runs.

Formally, a type system consists of:

A set of base types (e.g., int, float, char)
Type constructors to build composite types (e.g., array, pointer, struct, function)
Type rules that assign types to every syntactic construct
Type relations — type compatibility and conversion rules

A program that satisfies all type rules is called well-typed. A well-typed program cannot produce certain classes of runtime errors — this is the fundamental guarantee a type system provides.

Static vs. Dynamic Type Checking

The most important axis along which type systems vary is when type checking occurs:

Feature	Static Type Checking	Dynamic Type Checking
When	At compile time	At runtime
Languages	C, C++, Java, Rust	Python, Ruby, JavaScript
Error detection	Errors caught before execution	Errors manifest during execution
Performance	No runtime type checks needed	Overhead per operation
Flexibility	Less (must declare types)	More (types determined at runtime)

A language can use both: Java statically types the vast majority of code but uses dynamic checks for downcasts ((Subclass) obj).

Strong vs. Weak typing is a different axis — it concerns how strictly types are enforced at runtime:

Feature	Strong Typing	Weak Typing
Implicit conversions	Few or none	Many and permissive
Example	Python refuses `"hello" + 3`	C allows `char* ptr = 3.14;` (dangerous)
Safety	Higher	Lower

C is statically typed but weakly typed — the compiler knows the types but allows dangerous implicit conversions. Python is dynamically typed but strongly typed — types are checked at runtime, but the language refuses nonsensical operations.

Type Coercion vs. Type Casting

When a programmer writes float x = 3 + 4.5;, the integer 3 must be converted to float before the addition. The compiler handles this automatically — this is the coercion vs. casting distinction.

Type Coercion (Implicit Conversion): The compiler silently converts a value from one type to another when the types in an expression don't match. Rules are built into the language specification.

int a = 5;
float b = a + 2.7;    // a is coerced from int to float before addition

The coercion hierarchy in most languages:

char → short → int → long → float → double → long double

This is called widening conversion — it never loses precision and is always safe.

Narrowing conversions (e.g., double → int) may lose information and typically require an explicit cast:

Type Casting (Explicit Conversion): The programmer explicitly instructs the compiler to convert a type:

1int total = 10, count = 3;
2float avg = (float) total / count;    // Explicitly cast total to float
3                                      // Without the cast: integer division = 3
4                                      // With the cast: floating-point division = 3.33

In the compiler, inserted coercion operations become explicit conversion nodes in the annotated AST — just as the semantic analyzer adds type annotations, it also inserts coercion nodes where the language rules require them.

Type Inference

Some languages require every variable to have an explicit type annotation. Others can infer types from context. This is type inference:

1-- Haskell: compiler infers that x is Int and add takes Int → Int → Int
2add x y = x + y

1// C++ (C++11+): auto keyword triggers type inference
2auto result = computeValue();    // Compiler deduces the type from the return type

Hindley-Milner type inference is the classical algorithm used in functional languages (Haskell, ML). It works by:

Generating type constraints from the program structure
Solving the constraints using unification — the same algorithm used in Prolog
Checking that the solution is consistent (no contradictions)

From a compiler perspective, type inference is essentially an extended form of type checking: instead of verifying that annotations are correct, the compiler synthesizes the annotations from how the value is used.

Ad-hoc polymorphism means the same function name behaves differently depending on its argument types.

1int add(int a, int b);      // Version 1
2float add(float a, float b); // Version 2

How the compiler resolves overloading (name mangling): At compile time, the compiler performs name mangling — it encodes the parameter types into the function's internal name:

add(int, int) → _Z3addii
add(float, float) → _Z3addff

The linker sees these as completely separate functions. The correct version is selected at compile time based on the static types of the arguments.

This is resolved entirely at compile time — zero runtime cost.

Abstract Data Types and Encapsulation

An Abstract Data Type (ADT) separates the interface (what operations are available) from the implementation (how data is stored). Classes in C++/Java/Absolute are the primary mechanism:

1class Stack {
2private:
3    int data[100];    // Hidden implementation
4    int top;
5public:
6    void push(int x); // Public interface
7    int pop();
8};

How the compiler enforces encapsulation:

Access specifiers (private, protected, public) are enforced entirely at compile time. The compiler rejects any external code that accesses data[] or top directly.
Name mangling for member functions encodes the class name, enabling overloaded methods across different classes.
The this pointer — every non-static member call obj.push(5) is transformed by the compiler into Stack::push(&obj, 5). The hidden this parameter gives the method access to the specific instance's private data.

From the compiler's perspective, ADTs add:

A scope layer for member names (class scope nested inside namespace scope)
Access checking as part of name resolution in the semantic analyzer
Implicit this parameter in every member function call

The private keyword generates no runtime protection — private is purely a compile-time contract. Once the code is compiled, all fields are equally accessible in memory.

Common Exam Traps

Coercion ≠ Casting: Coercion is implicit (the compiler does it silently). Casting is explicit (the programmer writes it). If a question asks about implicit conversion, the answer is coercion.
Overloading ≠ Overriding: Overloading is compile-time (same name, different parameters, same class). Overriding is runtime (subclass redefines a virtual method from the base class). Overloading is name mangling; overriding is vtable dispatch.
Static vs. Dynamic typing ≠ Strong vs. Weak typing: These are two independent axes. C is static but weak. Python is dynamic but strong. Java is static and strong. Know the difference.
Type inference skips annotations but not checking: Even in languages with full type inference, the compiler still checks all types — it just doesn't require you to write them out.
private is compile-time only: There is no hardware or runtime enforcement of private members. It is purely a semantic analysis check.

Knowledge Check

Question 1 of 5

Q1Single choice

What happens when a compiler encounters int x = 3.7 + 2; in a statically-typed language?

The value 3.7 is coerced to int (3) before the addition

The integer 2 is coerced to float (2.0), the addition yields 5.7, which is narrowed to 5 for x

A compile-time error is raised because types don't match

The expression is evaluated at runtime with dynamic typing

PYQ Analysis and Exam Preparation

Compilation of Object-Oriented Features