Threads and Multithreading Models

60 mins

An in-depth analysis of thread architecture, the trade-offs between user and kernel-level management, and the evolution of multithreading models for modern multicore systems.

Learning Goals

Differentiate between the shared and private resources of a thread.
Evaluate the pros and cons of User-Level Threads (ULT) vs. Kernel-Level Threads (KLT).
Deconstruct the Many-to-One, One-to-One, and Many-to-Many multithreading models.
Understand the challenges of multicore programming, including data and task parallelism.

Thread Fundamentals: The Lightweight Process

In modern computing, a Thread is the basic unit of CPU utilization. It comprises a thread ID, a program counter (PC), a register set, and a stack. However, unlike a process, a thread is not isolated; it shares resources with other threads belonging to the same process.

Process vs. Thread: A Comparison of Resources

A traditional (heavyweight) process has a single thread of control. In a multithreaded process, multiple threads coexist within the same address space.

Resource Type	Shared among Threads	Private to each Thread
Memory	Code section, Data section (Global variables)	Stack (Local variables, return addresses)
Context	OS Resources (Open files, signals)	Register set, Program Counter (PC)
Identifiers	Process ID (PID)	Thread ID (TID)

Why Multithread?

Responsiveness: Allows a program to continue running even if part of it is blocked (e.g., a GUI remains responsive while a background thread loads data).
Resource Sharing: Threads share the memory and resources of the process by default, which is much easier than Inter-Process Communication (IPC).
Economy: Allocating memory and resources for process creation is costly. Since threads share resources, it is much more economical to create and context-switch threads.
Scalability: In a multiprocessor architecture, threads can run in parallel on different processing cores.

User-Level vs. Kernel-Level Threads

The "who" and "where" of thread management significantly impacts performance and behavior.

1. User-Level Threads (ULT)

Threads are managed by a thread library in user space, without kernel intervention. The kernel is unaware of these threads and treats the entire process as a single unit of execution.

Pros: Thread switching does not require kernel mode privileges; it is extremely fast. Scheduling can be customized for the application.
Cons: If a single user-level thread performs a blocking system call (e.g., reading from disk), the entire process is blocked by the kernel, even if other threads in the process are ready to run.

2. Kernel-Level Threads (KLT)

The OS kernel directly manages the threads. The kernel maintains context information for the process as a whole and for individual threads within the process.

Pros: The kernel can simultaneously schedule multiple threads from the same process on different CPUs. If one thread blocks, the kernel can schedule another thread of the same process.
Cons: Thread creation and context switching require a mode switch to the kernel, adding significant overhead compared to ULTs.

Explicit Comparison: Two Key Differences [2024 Q2b, 2023 Q3c]

Feature	User-Level Threads (ULT)	Kernel-Level Threads (KLT)
Management	Thread library in user space (no kernel intervention)	OS kernel directly manages threads
Blocking behavior	If one thread blocks (I/O), the entire process blocks	If one thread blocks, the kernel can schedule another thread of the same process
Context switch speed	Fast — no mode switch to kernel required	Slow — requires mode switch to kernel
Parallelism	Cannot run in parallel on multiple CPUs — kernel sees only one thread per process	Can run in parallel on multiple CPUs — kernel schedules each thread independently
Creation overhead	Low — library manages threads without system calls	High — each thread creation requires a system call

When is one type better than the other?

ULT is better when the application creates many short-lived threads and the system is single-core. The low creation/switch overhead makes it efficient for high thread counts.

Multithreading Models

The relationship between user threads and kernel threads is defined by one of several multithreading models.

1. Many-to-One Model

Maps many user-level threads to one kernel thread.

Behavior: Thread management is done in user space.
Limitation: The entire process blocks if a thread makes a blocking system call. Only one thread can access the kernel at a time, so multiple threads cannot run in parallel on multicore systems.

2. One-to-One Model

Maps each user thread to a kernel thread.

Behavior: Provides more concurrency than the many-to-one model by allowing another thread to run when a thread makes a blocking system call.
Pros: Allows multiple threads to run in parallel on multiprocessors.
Cons: Creating a user thread requires creating a corresponding kernel thread, which can burden system performance. (Used by Windows and Linux).

3. Many-to-Many Model

Multiplexes many user-level threads to a smaller or equal number of kernel threads.

Behavior: The number of kernel threads may be specific to either a particular application or a particular machine.
Pros: Developers can create as many user threads as necessary, and the corresponding kernel threads can run in parallel on a multiprocessor. When a thread performs a blocking system call, the kernel can schedule another thread for execution.

Multicore Programming and Implicit Threading

As hardware evolved from single-core to multi-core, the focus shifted from Concurrency (overlapping execution) to Parallelism (simultaneous execution).

Types of Parallelism:

Data Parallelism: Distributes subsets of the same data across multiple cores and performs the same operation on each (e.g., adding two large arrays).
Task Parallelism: Distributes different tasks (threads) across multiple cores. Each thread performs a unique operation.

Implicit Threading: The Modern Approach

Manually managing hundreds of threads is error-prone. Modern systems use Implicit Threading, where the runtime environment or compiler handles the creation and management of threads.

Thread Pools: A collection of threads created at startup that sit and wait for work. This prevents the overhead of frequent thread creation and prevents the system from being overwhelmed by too many concurrent threads.
Fork-Join: A parent thread creates (forks) several child threads and then waits (joins) for them to finish before continuing.
OpenMP: A set of compiler directives for C/C++ that identifies parallel regions. For example, #pragma omp parallel for tells the compiler to run a loop in parallel.

Knowledge Check

Question 1 of 3

Q1Single choice

In a multithreaded process, which of the following is NOT shared among threads?

Code section

Global variables

Stack

Open files

Pthreads Tutorial - LLNL

web

Process Concept, States, and PCB

CPU Scheduling Algorithms