Maximum likelihood estimation (MLE) is a statistical approach used for fitting non-linear learning models. We try to estimate parameters such that a likelihood function is maximised:

The intuition behind this is that we want estimates such that the predicted probability for one set is as close to 1 as possible (i.e., for people who defaulted on debt) and for another as close to 0 (i.e., for those who didn’t).

MLE is used for fitting logistic regression models.

Log likelihood

We run into a problem if we have: billions of parameters/data and if the data is independent. This means that the likelihood cannot be practically computed as a product of many probabilities, especially because we lose precision. Instead, we take the log-likelihood: .

Since many learning problems involve error functions, we can turn this computation into the minimisation of the loss by taking the negative log-likelihood. More on that over there.