In computer architecture, vector architectures exploit data-level parallelism on vectors of data by pipelining the ALU.
This has a benefit over SIMD for being silicon cost-effective at the expense of performance.
The basic idea:
- Load data from memory and put them in order in a large set of registers.
- A vector architecture might have 32 vector registers, each with 64 64-bit sub-registers.
- This aids caching, because of how local the data is to each other.
- Operate on them sequentially in registers using pipelined execution units.
- This saves the instruction fetch/decode overhead of just running a loop. It also saves overhead checking for pipeline hazards.
- Write back to memory.