Modern processors (CPUs, GPUs) are increasingly parallel, which requires a change in how we think about process scheduling. Assume symmetric multiprocessing (all CPUs connected to the same physical memory, individual private cache).

General approaches

Some approaches:

  • Same scheduling for all CPUs — one scheduler, we keep adding processes while there’s available CPUs. This leads to good CPU utilisation and is fair to all processes, but is not scalable (everything blocks on the global scheduler) and has poor cache locality.
    • Linux 2.4 used this.
  • Per-CPU schedulers — when there’s a new process, assign it to a CPU (like the one with the lowest number of processes). This is easy to implement, scalable, and has good cache locality. But, this results in a broad load imbalance and some CPUs might have less processes or less intensive ones.
  • A compromise is a balance between global and per-CPU — where a global scheduler can be used to rebalance per-CPU queues; if a CPU is idle, we can take a process from another CPU (work stealing).
    • Processor affinity is the preference of a process to be scheduled on the same core.
    • This is the scheduler used in Linux 2.6.
  • Gang scheduling (coscheduling) — allows us to run a set of processes simultaneously (mainly relevant for HPC) as a unit. This requires a global context-switch across all CPUs.
  • For real-time scheduling, we may have a different set of constraints. Hard real-time systems have a required task completion guarantee. Soft real-time systems meet this deadline in practice, but isn’t strict.
    • Linux can do soft real-time scheduling.
  • Fair scheduling

GPUs

GPUs have a substantially higher number of cores than CPUs.