In applied mathematics, information theory focuses on quantifying how much information is present in a signal. It finds broad applications in signal processing and in machine learning.
Basic idea: an unlikely event occurring is more informative than learning a likely event has occurred.1 Likely events have little to no information content. Unlikely events have higher information content. Events with independent probability should have additive information (i.e., an event happening twice has more information than an event happening once).
The self-information of an event (in units of nats) is defined as:
We define one nat as the information gained by observing an event of probability . If we used a base-2 logarithm, we have units called bits or shannons.
Resources
- Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay
- Information Theory, from Coding to Learning, by Yury Polyanskiy and Yihong Wu
Footnotes
-
i.e., “the sun rose this morning” isn’t informative, but “there was a solar eclipse this morning” is very informative. From Deep Learning by Goodfellow, Bengio, Courville, and Bach. ↩