Graphics processing units (GPUs) are specialised computer processors designed for computer graphics and image processing. They’re optimised for parallel computations and simultaneous computations.

The idea is that GPUs are able to make many parallel computations at the expense of thread speed. So compared to a CPU, each GPU thread will be much slower but we can do millions of operations in parallel.

Trade-off is: 4 operations in parallel really fast (CPU), or 1 million operations in parallel really slow (GPU)

Architecture

The best way to describe GPU architecture is with a comparison with CPUs.

  • Core count — GPUs have cores on the order of tens of thousands (10k+). Consumer CPUs have less than 32 usually.
  • Cache memory — CPUs have a layered cache structure, which aim to minimise cache misses. GPUs have a shared cache between all cores, and generally lack the same amount of cache memory as CPUs (might only have an L2 cache).

Some GPUs (especially Nvidia) partition cores into larger processor blocks (of a few dozen cores). These blocks have a shared memory. All blocks also share a global pool of DRAM, which can be read by the host CPU as well.

Programming

We have a few general workflows that hold regardless of GPU architecture or task (note how similar this looks for training deep learning models in CUDA):

  • Allocate memory on the GPU.
  • Copy data from the CPU to the GPU.
  • Run the kernel, compute.
  • Copy data back to the CPU, and free GPU memory.

We generally use a proprietary compiler to compile our programs into GPU-specific machine code. For Nvidia GPUs, this is done with NVCC.

Resources

Sub-pages