In operating system design, a process is a single running instance of a program. There’s a handful of basic requirements for a process, namely that it must have a set of:

Process control blocks contain information about the process, including: process state, CPU registers, scheduling information, memory management information, IO status information, among others. Each process gets a unique process ID (pid) to keep track of it. In Linux, this is done in a struct task_struct.1

Process states

Process states can be represented as a DFA, and thus in a state diagram.2 This state helps manage multiple different running processes. The “blocked” state effectively stalls the process while it’s waiting for something (like a syscall result). Note that this means a process may not be fully finished executing when it’s moved to waiting. A small-core CPU may be juggling the processes in such a way that one runs a few instructions then waits, then the next.

On Linux systems, we have the following states:

  • R denotes running and runnable (running and waiting).
  • S denotes interruptible sleep (blocked).
  • D denotes uninterruptible sleep (blocked0.
  • T denotes stopped.
  • Z denotes zombie.

The Linux kernel allows us to explicitly stop a process to prevent it from running, but we (as the programmer) or another process must explicitly continue it.

After the kernel initialises, it creates a single process init. This is responsible for executing every other process on the machine. It must always be active: if it exits, the kernel thinks we’re shutting down.

proc directory

The /proc directory represents the kernel’s states (not real files). Each directory in /proc that’s a number (a pid) represents a process. Within a process n’s directory, there’s a file status (/proc/n/status) contains the state of the process.

The process ID is unique for every active process. On most Linux systems, the maximum pid = 32768, and 0 is reserved. The kernel will recycle the pid.

Creating processes

There’s two ways used to create processes from scratch. In Windows, the program is loaded into memory and the PCB is created.

In UNIX-like systems, process creation clones the currently running process’ PCB into a new one (modelled as a parent-child relation). This reuses all of the information from the process, including variables. After this, each are functionally independent, and they can execute different parts of the program together or create more PCBs.

The only way to create a new process in UNIX is the fork syscall, which does the above: it creates a new process as a copy of the current one.

  • int fork(void) returns the pid of the newly created child process: -1 denotes a failure, 0 in the child process, >0 in the parent process.
  • Now there are two processes running with the same variables (copies of each other, won’t sync). Note that when the child process is spawned, it continues running from the same line as the parent process (because it’s an exact copy).

The execve syscall replaces the process with another program and stops the process. The wait syscall is used on child processes, it essentially blocks program execution until the child process exits. Then it cleans everything up.

Zombie processes wait for parents to read its exit status. Say a child process is terminated, but it hasn’t been acknowledged. The parent process may not necessarily read the child’s exit status (an error). In this case, the OS might cause an interrupt for the parent process to acknowledge the child.

Orphan processes require being re-assigned to a new parent. If the parent exits before the child, then init (or another special process) will take care of any child processes.

Footnotes

  1. See the Linux source code.

  2. From Prof Eyolfson’s lecture slides.