Probability theory is a branch of mathematics concerned with the science of chance. Probability describes the real world. The data we collect and models we build involve statistics and statistical learning.

Basic terminology

A random experiment is one where there’s a varied outcome each time. We define the sample space (or ) as an atomic set of all possible outcomes. We define the likelihood of each outcome in trials as , and the relative frequency of an event in trials as , such that the probability of an event is:

In practice, an infinite number of trials is impossible, so we instead define several axioms to model probability:

  • The probability of an event , .
  • Total probability, i.e., of the sample space, is .
  • If events are mutually exclusive, then .

We have some handy formulas, but it’s nice to first define some terms.

  • An event is a condition on an outcome. For example, an event such that a roulette result is even and is non-zero.
    • If an event produces one outcome (instead of a subset of ), then we call this an elementary event.
    • Complicated events are a set union of elementary events.
  • The complement of an event is the event that does not occur. Complementary events are mutually exclusive and exhaustive events; i.e., their probabilities will together sum to 1.
    • We additionally define the complement of as or or . Then, .
  • Mutual exclusivity of two events suggest that they cannot occur at the same time.
  • Events are exhaustive if at least one event must occur, i.e., when rolling a six-sided die.
  • Events are independent if one event occurring does not affect the probability of another event occurring.
  • Conditional probability is the probability of event occurring given that event has occurred.

And some useful axioms:

  • For complementary events, .
  • For combined events, .
  • For mutually exclusive events, . In fact, if is the union of distinct mutually disjoint subsets, then is the sum of the probability of each subset.
  • For conditional events, .
  • For independent events, .

Extensions

It also helps us to define a few formulas given certain situations. If every outcome is equally likely, then the probability of an event is:

Constructing statements for the probability of certain events often takes combinatorial approaches, including permutations and combinations.

For situations that involves us successively making decisions (like sampling from a set of items more than once). In this case, we can refer to the rule of products. This holds regardless of how the preceding steps were formed:

If an operation consists of steps, and the first step can be performed in ways, second in ways, th step in ways, then the entire operation is performed in: .

From here, we can obviously tell that if , then the operation has possibilities. If we don’t replace an element in the sample space (i.e., we can’t repeat an element), then we have , and the operation has possibilities.

Over feature vectors

We have (like CSPs):

  • A set of variables .
    • This represents a different feature of the world that we might be interested in knowing.
    • Each different total assignment to these variables will be an atomic event .
  • A finite domain of values for each variable .

For 3 variables with a domain of size 3, we get 27 atomic events. The number of atomic events grows exponentially with the number of variables for a size of domain : .

One way we can compactly express probability with feature vectors is to indicate a subset:

There are a couple of problems with working over feature vectors:

  • There are an exponential number of atomic probabilities to specify, so we can’t get all that data.
  • Computing requires summing up an exponential number of items, so even if we had the data, we wouldn’t be able to compute efficiently with it.

The main way this is mitigated is with conditional independence to simplify the problem and reduce the data/computational requirements.

Sub-pages

Resources

  • Probability, Statistics, and Random Processes for Electrical Engineering, by Alberto Leon-Garcia
  • Probability Theory: the Logic of Science, by E.T. Jaynes

See also