Layer normalisation

Layer normalisation is a variant of batch normalisation that functions similarly to BN but for a minibatch of size 1. This functions well for convolutional neural networks, because they operate on data with a grid topology, so it can still average across all locations of the grid.

We define LN as:

LN (x) = \frac{x - μ ^}{σ ^}

And we define the mean as:

\overset{μ}{^} = \frac{1}{n} i = 1 \sum n x_{i}

And the standard deviation as (with an offset to prevent division by 0):

\overset{σ}{^}^{2} = \frac{1}{n} i = 1 \sum n (x_{i} - \overset{μ}{^})^{2} + ϵ

We use LN because it prevents divergence of the model because the output of LN is scale-independent. It also doesn’t depend on the minibatch size and if we’re doing training or testing.

It’s just a transformation that standardises the activations to a given scale.

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Layer normalisation

Graph View

Backlinks