Weight decay (or regularisation) is a parametric regularisation technique that applies an parameter norm penalty. It is applicable to both traditional statistical models (like linear regression) and neural network approaches.

Essentially: we have a weight vector. We penalise the model heavier for large components of the weight vector, which biases the learning algorithm towards distributing the weight more evenly across a larger number of features.

This drives weights closer to the origin, and generalises better by lowering the variance.