We have a few metrics for machine learning classification tasks.

Accuracy

Confusion matrices help us assess how accurate our model is. As a quantitative measure, we take the accuracy as:

Note that we should think about what is most important for our model to achieve. For instance, if we were predicting the probability of cancer, and {negatives far outweigh the positives, our model only assesses negatives}, we’d get a pretty high accuracy but the model still wouldn’t suffice.

Precision and recall

Some other metrics include precision, which describes how many are correctly classified in the predictions:

And recall, which describes how often the model correctly identifies true positives from all positive instances predicted:

Other metrics

The support is how many of a given class occur in the model.

The F1-score (or F-measure) is the harmonic mean of the precision and recall of a classification model with the aim of indicating the reliability of the model. Both contribute equally to the store.

In code

For classification tasks, we can output key metrics with the classification_report(), imported from sklearn.metrics import classification_report.

This outputs the precision, recall, F1-score, and support of the classes. The first column lists down what classes belong to the metrics (for a binary classifier, we see two: 0 and 1), and the accuracies.