In deep learning, the backpropagation algorithm (or just backprop) is a method for computing the gradient. It’s used to adjust neuron weights to create more accurate predictions.

The idea is that when we compare the prediction with the ground truth and get an error, backpropagation “propagates” the error backwards through the neural network layer by layer. At each layer, the weights are adjusted based on how much they contributed to the overall error. Then the process repeats over many iterations until the weights are fine-tuned.

We follow gradient descent to do these optimisations. We iteratively adjust its parameters in the direction of the negative gradient to minimise the error function’s values.

Mathematics

The backpropagation algorithm performs a Jacobian-gradient product for each operation in the NN’s layers. This comes from the idea of the generalised chain rule, which allows us to differentiate a composite function by breaking it down into its constituent functions and applying the derivatives.

Each contribution is given by the equation:

where we have the following parameters:

  • is the loss function between the ground truth and network predicted values.
  • is the activation of the -th neuron in a particular network.
  • is the weight that connects the -th neuron to the -th neuron.
  • is the error associated with the -th neuron.
  • is the derivative of the activation function for the -th neuron.

As before, this comes from the generalised chain rule. We can think of as made up of a number of distinct constituent functions, which we differentiate and multiply/add together.