Residual network

In deep learning, residual networks are a type of architecture used primarily to avoid the vanishing gradient problem in very deep networks. This is done by using “skip connections” to provide deeper layers’ gradient computations more direct access to signals that might otherwise be lost. One consideration: we need to make sure that the dimensions of $F (x)$ and $x$ are consistent. Oftentimes if we simply apply a pooling layer this can result in inconsistent dimensions. To rectify this, we may be interested in applying a linear transformation $m x + b$ .

Original paper

In the original paper, sizes were downsampled using a stride=2 instead of max/average pooling. After the last convolutional layers, we use global average pooling, meaning the embedding has no spatial dimension and only consists of 512 floats.

Meaning: our CNN is input-dimension invariant. Previously, we had to standardise the dimensions of the input because we hardcoded the dimension changes.

At the end, all we need is a single fully-connected classification layer, because the learned embeddings are so good we don’t need a complex classifier at the end of the model.

In code

# normal layer
next_activation = layer(activation)
# residual layer
next_activation = activation + layer(activation)

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Residual network

Original paper

In code

Graph View

Backlinks