Graphics processing units (GPUs) are specialised computer processors designed for computer graphics and image processing. They’re optimised for parallel computations and simultaneous computations.
The idea is that GPUs are able to make many parallel computations at the expense of thread speed. So compared to a CPU, each GPU thread will be much slower but we can do millions of operations in parallel.
Trade-off is: 4 operations in parallel really fast (CPU), or 1 million operations in parallel really slow (GPU)
Hardware architecture
The best way to describe GPU architecture is with a comparison with CPUs.
- Core count — GPUs have cores on the order of tens of thousands (10k+). Consumer CPUs have less than 32 usually.
- Cache memory — CPUs have a layered cache structure, which aim to minimise cache misses. GPUs have a shared cache between all cores, and generally lack the same amount of cache memory as CPUs (might only have an L2 cache).
Some GPUs (especially Nvidia) partition cores into larger processor blocks (of a few dozen cores). These blocks have a shared memory. All blocks also share a global pool of DRAM, which can be read by the host CPU as well.
Software architecture
Shaders are compiled into packets that run on the GPU. These packets are akin to a vendor instruction set architecture, often proprietary.
- On Nvidia, PTX (Parallel Thread Execution) is a virtual ISA that is hardware-agnostic. SASS targets a specific GPU architecture.
- On AMD, PM4 packets are used.
Resources
- *Programming Massively Parallel Processors, by David B. Kirk and Wen-mei W. Hwu
- Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking
- GPU Programming video series by Simon Oz