Polars is a dataframe library for Python, intended to be a modern and performant replacement for pandas.

https://docs.pola.rs/user-guide/misc/multiprocessing/

Quick links:

Basics

What makes Polars better than pandas?

  • Multithreaded by default, with a Rust backend.
  • Modern memory management, with Apache Arrow.
  • API query optimisations via LazyFrames.
  • More readable and maintainable function calls. IMO these make more sense than pandas function calls.

LazyFrames

One of Polars’ key features is “lazy evaluation”, which should be the default way we interact with dataframes, because Polars will optimise the queries under the hood.

To run in lazy mode, we should use an implicitly lazy function (scan_csv() over read_csv()) or use the .lazy() method to convert a dataframe to a lazyframe.

To convert back to a dataframe (and evaluate all the build-up queries), we use the .collect() feature. Use this sparingly and only when necessary.

See also