Information theory

In applied mathematics, information theory focuses on quantifying how much information is present in a signal. It finds broad applications in signal processing and in machine learning.

Basic idea: an unlikely event occurring is more informative than learning a likely event has occurred.¹ Likely events have little to no information content. Unlikely events have higher information content. Events with independent probability should have additive information (i.e., an event happening twice has more information than an event happening once).

The self-information of an event $x$ (in units of nats) is defined as:

I (x) = - ln P (x)

We define one nat as the information gained by observing an event of probability $\frac{1}{e}$ . If we used a base-2 logarithm, we have units called bits or shannons.

Resources

Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay
Information Theory, from Coding to Learning, by Yury Polyanskiy and Yihong Wu

i.e., “the sun rose this morning” isn’t informative, but “there was a solar eclipse this morning” is very informative. From Deep Learning by Goodfellow, Bengio, Courville, and Bach. ↩

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Information theory

Resources

Graph View

Backlinks

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Information theory

Resources

Footnotes

Graph View

Backlinks