Machine learning (ML, also called statistical learning) is a branch of statistics and artificial intelligence with broad applications in software engineering, where machines can “learn” tasks from data without explicit programming. Languages like Python or MATLAB are common tools of the trade.

Terminology

Formally, a computer program learns from an experience with respect to tasks measured by performance , if its performance in measured by improves with . Each piece of information included in the representation of the data is a feature.

Core ideas

Classical machine learning consists of very complicated statistical approaches. These are rigorous ways to create symbolic models influenced by the data. The reason we don’t use symbolic machine learning (i.e., strict rules) is because any rule we come up with will probably have a counter-example in the real world. It’s hard to formulate rules that cover all possible conditions because there’s a high-dimensional input space. We must use a way that learns from examples, hence many statistical learning techniques.

In practice, conventional statistical learning techniques don’t approach human-level performance.

This is why deep learning (a modern subset of ML) is so dominant — because they learn better without strict programming of the rules, and hence can reach human-level performance in many tasks.

Wait, what’s the relation between everything? .

A consequence of deep learning’s dominance is that much of the time spent in ML work is related to data preparation — cleaning, sorting, and labelling for meaningful use.

A way to quantify the performance of ML models is via the use of an error function — in other words, how much a model deviates from data. A big part of this is to divide datasets into a training set, validation set, and testing data. This ensures the model doesn’t overfit for the dataset and is able to generalise to broader use. The validation data is used to tune the parameters of the model. The testing data is kept fully away from the model’s training (otherwise it just functions the same as training data).

Types of approaches

supervised v unsupervised v reinforcement learning

Discriminative models are able to learn the decision boundary between different classes. They’re specifically meant for classification tasks, and maximise the conditional probability for a label and input : . Generative models learn the input distribution and aim to maximise the joint probability by estimating to find using Bayes’ theorem.

Resources

  • Introduction to Statistical Learning, with Applications in Python, by James, Witten, Hastie, Tibshirani, and Taylor
  • The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman
  • Probabilistic Machine Learning: {An Introduction, Advanced Topics}, by Kevin P. Murphy
  • Pattern Recognition and Machine Learning, by Christopher M. Bishop
  • Information Theory, Inference, and Learning Algorithms, by David MacKay
  • Ilya Sutskever’s 30 crucial papers to get up to speed with modern ML

Key concepts

See also