In computer architecture, vector architectures exploit data-level parallelism on vectors of data by pipelining the ALU.

This has a benefit over SIMD for being silicon cost-effective at the expense of performance.

The basic idea:

  • Load data from memory and put them in order in a large set of registers.
    • A vector architecture might have 32 vector registers, each with 64 64-bit sub-registers.
    • This aids caching, because of how local the data is to each other.
  • Operate on them sequentially in registers using pipelined execution units.
    • This saves the instruction fetch/decode overhead of just running a loop. It also saves overhead checking for pipeline hazards.
  • Write back to memory.

See also