In operating system design, a process is a single running instance of a program. There are some basic requirements for a process, namely that it must have a set of:
- Virtual registers (including program counter)
- And virtual memory (stack, heap, global variables)
- And any file descriptors the process has open
Since the process is currently running on the CPU (and not the operating system), the OS runs based on a periodic timer interrupt system (every few ms). Whether to keep running the same process or a different process is up to scheduling mechanisms.
Modern operating systems are able to run multiple processes in parallelisation and concurrently.
Basics
Key question: if the process is running on the CPU, how does the operating system even run? Solution: we use a periodic timer interrupt system (every few milliseconds). This halts the process and the OS starts to run. This also makes sure non-cooperative processes don’t have control.
Metadata
The operating system will maintain a process list of processes that are blocked, waiting and which are currently running.
Process control blocks contain information about the process, including: process state, CPU registers, scheduling information, memory management information, IO status information, among others. In Linux, this is done in a struct task_struct
.1 Each process gets a unique identifier. In UNIX/Linux, this is a process ID (pid
). In Windows, this is within a “handle”.
Process states
One property of processes is the state they’re in. These can be represented as a DFA within a state diagram.2 These states help the operating system juggle multiple different “running” processes.
- Running — the process is currently executing instructions.
- Waiting — the process is ready to run but is not currently executing.
- Note that this means a process may not be fully finished executing when it’s moved to waiting. This is done via scheduling mechanisms.
- Blocked — the process is in the middle of some kind of operation that makes it not ready to run until another event takes place (for example, an IO request to disk or a syscall result).
- And some extra states that may depend on the operating system:
- Initial — process is currently being created.
- Final — process has terminated but hasn’t been cleaned up yet (called a zombie process in UNIX/Linux).
On Linux systems, we have the following states:
R
denotes running and runnable (running and waiting).S
denotes interruptible sleep (blocked).D
denotes uninterruptible sleep (blocked0.T
denotes stopped.Z
denotes zombie.
The Linux kernel allows us to explicitly stop a process to prevent it from running, but we (as the programmer) or another process must explicitly continue it.
After the kernel initialises, it creates a single process init
. This is responsible for executing every other process on the machine. It must always be active: if it exits, the kernel thinks we’re shutting down.
proc directory
The /proc
directory represents the kernel’s states (not real files). Each directory in /proc
that’s a number (a pid
) represents a process. Within a process n
’s directory, there’s a file status
(/proc/n/status
) contains the state of the process.
The process ID is unique for every active process. On most Linux systems, the maximum pid = 32768
, and 0
is reserved. The kernel will recycle the pid
.
Operations
Creation
Processes load pieces of code/data only as they’re needed during program execution. The run-time stack and heap is also allocated by the OS.
TL;DR:
- Zombie process — child exits first
- Orphan process — parent exits first
Zombie processes wait for parents to read its exit status. Say a child process is terminated, but it hasn’t been acknowledged. The parent process may not necessarily read the child’s exit status (an error). In this case, the OS might cause an interrupt for the parent process to acknowledge the child.
Orphan processes require being re-assigned to a new parent. If the parent exits before the child, then init
(or another special process) will take care of any child processes.
In Windows
In Windows, the program is loaded into memory and the process control block is created. The syscall CreateProcess()
combines both of UNIX’s fork()
and exec()
, which creates a new process from scratch and executes it.
The new process doesn’t inherit resources from the parent. Note also that child/parent processes don’t exist in Windows.
In UNIX/Linux
In UNIX-like systems, process creation clones the currently running process’ PCB into a new one (modelled as a parent-child relation). This reuses all of the information from the process, including variables. After this, each are functionally independent, and they can execute different parts of the program together or create more PCBs.
The only way to create a new process in UNIX is the fork
syscall, which does the above: it creates a new process as a copy of the current one.
int fork(void)
returns thepid
of the newly created child process:-1
denotes a failure,0
in the child process,>0
in the parent process.- Now there are two processes running with the same variables (copies of each other, won’t sync). Note that when the child process is spawned, it continues running from the same line as the parent process (because it’s an exact copy).
The execve
syscall replaces the process with another program and stops the process.
The wait
syscall is used on child processes, it essentially blocks program execution until the child process exits. Then it cleans everything up.
Footnotes
-
See the Linux source code. ↩
-
From Prof Eyolfson’s lecture slides. ↩