In information theory, we define the uncertainty of an event as how sure we are that is going to happen. An uncertainty of 0 is that is definitely going to happen. An uncertainty of means it’ll never happen. Numerically, it’s given by:

For a random variable, we often want the average uncertainty, i.e., the expected value. In the discrete case, this is given by:

This is the Shannon entropy, which describes how much uncertainty there is in a random variable. This refers to a theoretical lower bound on the minimum number of bits we need to use to losslessly express the data (might not be achievable practically).

We use the base-2 logarithm to keep in line with other concepts in information theory, especially when problems involve the number of bits. The base is determined by the number of values it can take (for bits, 2). The unit for the natural logarithm are nats.

The differential entropy is defined by:

which describes the entropy for continuous random variables.

See also