The Zipf distribution is a probability distribution used primarily for text frequency, where it models the frequency of words in a large body of text being proportional to their rank (i.e., its frequency rank).

The probability mass function is given by:

where is the number of distinct words, , is the number of occurrences of the word, and is a normalisation constant (the th harmonic mean), given by:

The Zipf random variable has a property such that few outcomes (words) occur frequently, but most outcomes occur rarely. It finds use in studies on the Internet and interconnectivity.

Computations

The expected value is given by:

The second moment is given by:

And variance: