Apache Arrow is a framework for defining memory storage of column- and tabular-based data. It’s specially designed for modern CPU and GPU architectures to make better use of parallel computations (like SIMD instructions or vector processing/querying).

Arrow is primarily column-based instead of row-based. Each column is stored in the same buffer, as opposed to each element, which facilitates vector computations on column-wise operations. Because Arrow is inherently column-based, this also avoids an internal representation/data format specific to the library to implement column operations, which means Arrow libraries are interoperable with each other too.

Internally, Arrow is used as a memory back-end for cuDF and Polars.