Threads are virtual units that represent a single sequence of instructions to be processed by a processor core. We can use threads to perform multiple computations concurrently within a single process with multithreading.

Each thread has its own program counter register. Oftentimes they’re executing a different function and are almost always executing a different instruction from other threads. They also have their own stacks and stack pointers,1 but they otherwise share the same memory space (heap, program code, global variables, file descriptors). We need to explicitly state if any memory is specific to a thread (thread-level storage).

Why not processes? The shared memory space is what makes threads useful, since they can share data. Processes are used when we don’t need to share things.

Because of this, we have to be careful for threads to not conflict in reads/writes to the same non-atomic memory, because there will be undefined behaviour if they update the same variable at the same time. Note also that creating threads is less expensive than creating processes (because all we need is a new stack).

Basics

We use thread control blocks (TCBs) to store the state of each thread of a process. There are two main types of threads:

  • Joinable threads (usually the default) — wait for someone to call a join function then they release their resources.
    • These easily become zombie threads if they aren’t called join on.
  • Detached threads — release their resources when they terminate.

Threads generally complicate the kernel. There are a few important cases:

  • forking a process with multiple threads only copies the calling thread into a new process. Then, when it invokes pthread_exit, it’ll always exit with status 0.
  • When receiving signals, a thread is randomly selected to handle the signal. This makes concurrent code difficult, since any thread could be interrupted.

Implementation

Threads can be implemented either in user-space or kernel-space. Support requires a thread table, similar to a process table.

  • In user space:
    • There needs to be a runtime system to determine how to schedule things.
    • Very fast to create and destroy, since we don’t need syscalls and thus don’t need context switches.
    • If one thread blocks, it blocks the entire process (and the kernel can’t distinguish).
  • In kernel space:
    • The kernel manages everything for us, and we can treat threads specially.
    • Slower, since creation involves syscalls.
    • If one thread blocks, the kernel itself can schedule another one.

Most of the threading libraries we use run in user-mode. These libraries map user threads to kernel threads. There are a few mapping strategies:

  • In many-to-one, threads are completely implemented in user-space, and the kernel only sees one process.
    • This is fast and portable. It doesn’t depend on the system.
    • Drawback is one thread blocking causes all threads to block, so we can’t execute threads in parallel since the kernel will only schedule a process to run.
  • In one-to-one, each user thread maps directly to one kernel thread and the kernel handles everything.
    • These are just thin wrappers around syscalls to make it easier to use.
    • They allow us to exploit the full parallelism of our machine, and the kernel can schedule multiple threads simultaneously.
    • For Linux, this is typically the actual implementation used.
  • In many-to-many, many user level threads map to many kernel level threads, basically a hybrid approach between the top two.
    • For example, if we have 8 physical cores, we have 8 kernel-level threads, and distribute tasks to each core.
    • This allows us to get the most out of multiple CPUs and reduce the number of syscalls.
    • Leads to a very complicated thread library (like Java Virtual Threads).
    • We could use thread pools instead.

Sub-pages

Footnotes

  1. One of the implications of this is that we can’t return a pointer to something that might be allocated on the thread’s call stack.