|
Two distributions
|
Consider the sample space of cars.
Imagine a universe with only three makes of cars and only
three colors. We are interested
in two separate distributions
over the sample space of cars.
The first is the COLOR distribution,
with three events,
being red, being yellow, and being green.
The second is the MAKE distribution,
with three events,
being a Jag, being a BMW, being a VW,
Here are some statistics.
|   |
Green | Red | Yellow | Total |
|
VW
|
 
|
 
|
 
|
50
|
|
BMW
|
 
|
 
|
 
|
30
|
|
Jag
|
 
|
 
|
 
|
20
|
|
Total
|
60
|
20
|
20
|
100
|
As yet we dont know the details of how
color and make correlate.
|
Random
Variables
|
A random variable is just some function
that assigns numbers to events in a sample space.
We have two distributions we're interested in,
MAKE and COLOR, and neither of them takes
numbers as values, but that doesnt really matter.
We can just DEFINE random variables
in terms of the things we're interested in:
| MAKE(x) = | 1 if x is a VW |
|   | 2 if x is a BMW |
|   | 3 if x is a Jag |
| COLOR(x) = | 1 if x is green |
|   | 2 if x is yellow |
|   | 3 if x is red |
So we have
random variables, MAKE and COLOR with respective ranges:
- Range(MAKE) = {1, 2, 3}
- Range(COLOR) = {1, 2, 3}
The probability mass function(pmf) p for a
random variable X gives the probability
that X takes different values. Let
p be the pmf for COLOR. Let's take
1, the value signifying "green":
p(1) = p(COLOR=1) = P({ x in CARS | x is green})
Probability mass functions
are defined over the ranges of random variables.
Here are some things we know about the distributions
of MAKE and COLOR, using
frequentist estimates for the
probabilities:
- p(COLOR=1[green])=60/100=.6 [with no possibility of confusion, we write
p(green)=.6]
- p(COLOR=2[yellow])=20/100=.2
- p(MAKE=1[VW])=50/100=.5
Notice that it's kind of annoying having to go by way of
the 1,2,3 for MAKE and COLOR, given we're interested
in make and color. Frequently we just
notate this directly and write:
p(COLOR=green)
Replacing the number with the property the number picks out.
We have two distributions
MAKE and COLOR completely
defined in fact, but we dont know
the JOINT DISTRIBUTION, p(MAKE=x,COLOR=y) [read "the joint probability
that the make is x AND the color is y"].
|
Joint
Distribution
|
Here's one possible version of p(MAKE,COLOR):
Statistics determining COLOR
and MAKE in which the two
variables are independent
|   |
Green | Red | Yellow | Total |
|
VW
|
30
|
10
|
10
|
50
|
|
BMW
|
18
|
6
|
6
|
30
|
|
Jag
|
12
|
4
|
4
|
20
|
|
Total
|
60
|
20
|
20
|
100
|
Note that we can turn this directly into a
probability table just by dividing
all the numbers by 100, the total number
of cars. The 9 probabilities in
the chart then add up to 1.
Probabilities
for joint, independent
p(MAKE,COLOR)
|   |
Green | Red | Yellow |
|
VW
|
.30
|
.10
|
.10
|
|
BMW
|
.18
|
.06
|
.06
|
|
Jag
|
.12
|
.04
|
.04
|
This is actually a very special kind of pmf
called a joint distribution of independent variables.
We'll get back to why below.
Note that there is only one distribution
of colors and makes consistent with the two distributions
we started out with in which
the two variables are
independent. But there are
numerous joint distributions
consistent with the original
facts in which the variables
are not independent. Here are two others.
Distribution A
|   |
Green | Red | Yellow | Total |
|
VW
|
30
|
5
|
15
|
50
|
|
BMW
|
20
|
9
|
1
|
30
|
|
Jag
|
10
|
6
|
4
|
20
|
|
Total
|
60
|
20
|
20
|
100
|
Distribution B
|   |
Green | Red | Yellow | Total |
|
VW
|
30
|
10
|
10
|
50
|
|
BMW
|
20
|
5
|
5
|
30
|
|
Jag
|
10
|
5
|
5
|
20
|
|
Total
|
60
|
20
|
20
|
100
|
|
Conditional
Probability
 
VW INT green
the set of cars that
are both VWs
and green
|VW INT green|
the number of cars
that are both VWs
and green
|
The above tables give several kinds of
probability information.
- JOINT distribution. p(MAKE,COLOR).
Example: p(MAKE=VW,COLOR=green). The joint probability
that a car is a VW and green.
Frequentist estimate
(relative frequency):
|VW INT Green| / |CARS| ( = .3 Distribution A)
The frequency of cars that are both VWs and green
relative to the frequency of cars.
- CONDITIONAL distributions given
a color. p(MAKE| COLOR=green). p(MAKE | COLOR=red).
p(MAKE | COLOR=yellow).
Example:
p(MAKE=VW| COLOR=green). The probability
that a car is a VW given that it's green.
Frequentist estimate:
|VW INT Green| / |Green| ( = .5 Distribution A)
The frequency of cars that are both VWs and green
relative to the frequency of green cars.
- CONDITIONAL distributions given
a make. p(COLOR| MAKE=VW). p(COLOR | MAKE=BMW).
p(COLOR | MAKE=Jag).
Example:
p( COLOR=green | MAKE=VW). The probability
that a car is green given that it's a VW.
Frequentist estimate:
|VW INT Green| / |VW| ( = .6 Distribution A)
- MARGINAL DISTRIBUTIONS. p(COLOR).
p(MAKE). These are the distributions we started
with.
p(MAKE=x) = Sumy in COLOR p(MAKE=x,COLOR=y)
We can actually talk about nine different
probability distributions now,
one joint distribution, two marginal
distributions,
and six conditional distributions.
- p(MAKE). Sample space: set of cars (size: 100)
- p(COLOR). Sample space: set of cars (size: 100)
- p(MAKE,COLOR). Sample space: set of cars (size: 100)
- p(MAKE|COLOR=green). Sample space: set of green cars (size: 60)
- p(MAKE|COLOR=yellow). Sample space: set of yellow cars (size: 20)
- p(MAKE|COLOR=red). Sample space: set of red cars (size: 20)
- p(COLOR|MAKE=VW). Sample space: set of VWs (size: 50)
- p(COLOR|MAKE=BMW). Sample space: set of BMWs (size: 30)
- p(COLOR|MAKE=Jag). Sample space: set of Jags (size: 20)
So, for example, we think of p(MAKE|COLOR=green) as The MAKE distribution
restricted to the sample space of green cars.
|
p(X|Y) is not
a pmf
|
Note a couple of missing elements from our list of distributions:
p(MAKE | COLOR)
p(COLOR | MAKE)
Although these look like pmfs, they are not.
Note that
for each COLOR x,
- P(MAKE | COLOR =x)
defines a pmf that adds up to 1. For example:
P(MAKE=VW | COLOR=green) +
P(MAKE=BMW | COLOR=green) +
P(MAKE=Jag | COLOR=green) = 1
But unless we fix a color we dont have something
that adds up to 1.
So the notation
p(COLOR | MAKE)
doesnt tell you enough to
pick out a probability function that adds up to 1.
Note that confusingly
p(COLOR, MAKE)
is a pmf. Probabilities are being assigned to color, make
pairs, and these do add up to 1.
|
|
Chain Rule
|
The following relationship is called the chain rule:
p(COLOR=x,MAKE=y)= p(Color=x|Make=y) * p(Make=y)
It's completely symmetric as to which variable is given:
p(COLOR=x,MAKE=y)= p(Make=y|Color=x) * p(Color=x)
Why should this be true? In a sense you cant prove it,
not without a lot of assumptions about what a probability
is. But you can see why it might be a reasonable
axiom.
It's true if you look at probabilities in a purely
frequentist way:
- p(green, VW) = | green INT VW | / |CARS|
- p(green|VW) = | green INT VW | / |VW|
- p(VW) = |VW|/|CARS|
So, on the frequentist interpretation, it just works out to be true:
| green INT VW | / |CARS| = | green INT VW | / |VW| * |VW|/|CARS|
|
Independent
Distributions
|
Note an important special case of
the chain rule. We call two distributions X and Y
independent if:
p(X | Y) = p(X)
In other words the value of y has no effect on
the value of x.
Note the special case of the chain rule that holds for
independent distributions.
p(x,y)= p(x|y) * p(y) = p(x) * p(y)
Exercise: Verify that the table labeled an independent
distribution really is.
|
|
Bayes' Law
|
Recall the two versions of the chain rule:
- p(COLOR=x,MAKE=y)= p(Color=x|Make=y) * p(Make=y)
- p(COLOR=x,MAKE=y)= p(Make=y|Color=x) * p(Color=x)
From this we can immediately conclude Bayes' Law:
p(Color=x|Make=y) * p(Make=y) = p(Make=y|Color=x) * p(Color=x)
This is often written in the following form:
p(Color=x|Make=y) = p(Make=y|Color=x) * p(Color=x) / p(Make=y)
|