There’s a few different interfaces we can use. The easiest (analogous to OpenMP) is Joblib.

Joblib

The simplest and easiest parallel interface in Python is Joblib. It has an embarrassingly parallel for loop interface analogous to OpenMP’s ease of use in C/C++.1 Best yet: it works with Jupyter notebooks and on Windows (platform agnostic). It offers a literally free speed-up on parallelisable loops — cannot get easier than this.

from joblib import Parallel, delayed
from math import sqrt
 
Parallel(n_jobs=-1)(delayed(sqrt)(i ** 2) for i in range(100))

The parameters we use:

  • Parallel is the main interface.
  • delayed allows us to input function parameters.
  • n_jobs is the number of threads we use. If -1, we use all available threads on our system.

By default, Parallel uses the loky backend that starts separate Python worker processes. If the function avoids the use of the Global Interpreter Lock, it’s more efficient to use threads instead of Python processes as the workers, with prefer='threads'.

Footnotes

  1. Joblib documentation here.