Bayes' Theorem

Consider Goldie Lockes' predicament. She already has a probability distribution describing her (and Cactus's) beliefs about market demand for ACME's next-generation roadrunner traps. She knows, however, that additional information can be obtained (by, say, market research) that could induce her to revise her prior beliefs. This means that the original or prior distribution, which was derived subjectively, would change. For instance, she believes that the probability of a medium market demand is 0.6. Were she to conduct a market survey that pointed to a medium demand, it would reinforce her beliefs and lead her to revise the probability upwards. The question is, by how much? Surely, that depends on the degree of reliability of the market survey. The more reliable the information, the greater her confidence in it and therefore the greater its effect in modifying her prior beliefs.

The gist of Bayes' theorem is easy to grasp. If one has encoded one's prior beliefs about a hypothesis with a probability distribution, and one has access to additional information that could support or contradict the hypothesis and whose degree of reliability is known or is estimable, then one can revise one's prior distribution to reflect the resulting beliefs one should have, given the additional information, about one's original hypothesis. The revised distribution is known as the **posterior** (or * a posteriori *), while the reliability data are called

**likelihoods**(or

**likelihood function**). The prior distribution is known as the

**prior**(or

*), naturally. Graphically:*

**a priori**Bayes' Theorem follows logically from the definition of conditional probability: P(A|B) = P(A ^ B) / P(B)

Note: The symbol ^ denotes "intersection": A ^ B means "A intersect B".

Let's suppose there is an event **E** whose prior probability is known subjectively (works fine for objective probabilities, too). If you'd rather view this concretely, let **E** be the event «it will rain today when I go out to lunch». Let **I** be the indicator event (the additional information) «the weather forecast calls for rain at lunch». Clearly, a forecast of rain does not guarantee that it will rain. Moreover, **I** itself is an uncertain event because, a priori, you have no assurance what the forecast will be. It could call for sunny skies. This state of affairs can be depicted schematically with a Venn diagram:

Now, the prior of **E** is P(E), a marginal (not conditional) probability. Note that P(E) is simply the area of circle **E **divided by the total area **U** (the universe of possible events, defined to be 1). Thus, 0 ≤ P(E) ≤ 1. We would like to know the probability of **E** given a forecast **I** , that is, P( E | I ). By the definition of conditional probability:

P( E | I ) = P( E ^ I ) / P( I )

In other words, if forecast **I** is taken as given, then the probability of **E** changes to the area of the gray convex-lens shape **E ^ I** divided by the total area of circle **I**, for **I** is the given condition. In other words, the "background" event space is no longer **U** but **I**.

Note that by rearranging terms, we can restate the conditional probability definition as:

P( E ^ I ) = P( E | I ) · P( I )

Everybody knew that. Bayes' great insight was to see this:

P( E ^ I ) = P( I | E ) · P( E )

which follows by symmetry if **E** and **I** are both uncertain events, which a priori they are. That is to say, since both **E **and **I** are uncertain events and the intersection of those events is one and the same thing (the joint event), the probability of that joint event must be the same given either event **E** or event **I**. Consequently, substituting Bayes' insight for the joint probability in the definition of conditional probability, we obtain:

P( E | I ) = P( I | E ) · P( E ) / P( I )

Now look at the depiction of **I** shown under the Venn diagram above: **I** = ( E ^ I ) + ( Ē ^ I ), where Ē is the complement of E. Taking probabilities and expanding the terms as before:

Checking out the formula we see that P( E ) and P( Ē ) are the **priors** and P( I | E ) and P( I | Ē ) are **likelihoods**. If you've got the latter, you've got the **posteriors** and you're in business, kid.

In general, if the event set **E** comprises several possible outcomes *E _{j}* (

*j*= 1, 2, ... ,

*n*), as opposed to just

**E**and its complement

**Ē**, then the indicator

**I**can be wrong in its prediction in (

*n*- 1) ways. Bayes' formula is then:

where *E _{x}* is the outcome of interest.