In convolutional neural networks, pooling functions apply a modification to the output of a layer according to a small-sized kernel. This helps consolidate information.

Max pooling takes the maximum value in its kernel. It has an output dimension of:

Average pooling takes the arithmetic average of the kernel values, i.e., it sums it and divides by the number of pixels.

Implications

Pooling helps make the output representation approximately invariant to small translations of the input, which can be useful if we care more about whether a feature is present than exactly where it is.

If pooling is done over separately parameterised convolutions, the features can also learn which transformations to become invariant to.

Addendums

As an alternative to pooling, strided convolutions are also used instead, where there’s a kernel shift while the convolution is being computed.