The negative log-likelihood is a loss function that takes the negative logarithm of the maximum likelihood estimation (MLE):

Motivation and benefits

With the product of multiple probabilities, this can quickly get beyond regular machine precision. Taking the allows it to remain within machine precision. Taking the negative converts it from a maximisation problem (MLE) to a minimisation problem (NLL).

Note also that computing the derivative of the MLE involves the chain rule and thus requires computations. The NLL’s derivative is simpler and only requires time, because we can chain additions of logarithms.

In code

In PyTorch, we can use torch.nn.NLLLoss().