Neural networks are machine learning models that mimic biological networks with neurons, many layers, weights, and activation functions.

Basics

NNs essentially have layers and layers of adjustable nodes, like millions of adjustable knobs to determine its parameters. It can self-adjust based on a process called gradient descent, which aims to minimise the amount of error based on an error function. The function our neural network approximates is composed of many intermediate activation functions.

The number of layers (the depth of the model) determines the number of decision boundaries our model has. Many problems require more than one decision boundary, which motivates the idea of multilayer NNs, with hidden layers. These introduce non-linearities into the model that allow it to separate data such that the final layer can use linear separation on the data.

The architecture of a neural network determines how the data flows and what the neurons of each layer does and how they connect. Architecture greatly influences model performance.

Types

In code

The general training process has two broad processes. One is to first train the model, then to assess its performance.

We need to define the loss function and the optimiser before training. Then, for each iteration (epoch) of the training), we make a prediction, calculate the loss, obtain gradients, update parameters, and do a clean-up step.

out = model(curr) # make prediction
loss = criterion(out, actual) # calculate loss
loss.backward() # obtain gradients
optimiser.step() # update parameters
optimiser.zero_grad() # clean-up step