Stat 311 Probability
Stat 311 Probability
Probability (Ch 3)
Fritz Scholz
April 5, 2010
Frequentist Interpretation of Probability
In common language usage probability can take on several different meanings.
or: about 40% of the rain gauges in the forcast area showed rain.
The 40% area forecast may need some thinking in that regard to make it fit.
1
Subjectivist Interpretation of Probability
It is based on gut feeling (subjective) and internal vague memory of that person.
It is my lucky day, I feel 70% certain that I will win something in the lottery.
2
Axiomatic Probability
Rather than deciding which interpretation is correct or to adopt, we stay on neutral
grounds and use the axiomatic probability model proposed by Kolmogorov (1933).
3
Examples
Flipping a coin twice we can distinguish the following four outcomes:
HH HT
S=
TH TT
As observable events we can take the collection of all subsets of S, i.e.,
S, 0/
{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {TH, HT, TT}
C=
{HH, HT}, {HH, TH}, {HH, TT}, {TH, HT}, {TH, TT}, {HT, TT}
{HH}, {HT}, {TH}, {TT}
Measuring a person’s height. While we have some notion about such heights,
e.g., 1 foot ≤ height ≤ 9 feet, it is more convenient to us S = R, even though
most of these outcome values are not possible.
Observable events could be the collection of all intervals, plus what can be obtained
by set operations ∪, ∩ and c.
4
Observable Events
5
Collection of Events: Required Properties
Any collection C of events should satisfy the following properties:
7
The Probability Measure P
A probability measure on C assigns a number P(A) to each event A ∈ C
and satisfies the following properties (axioms):
3. For any sequence A1, A2, A3, . . . of pairwise disjoint events we have
! !
∞
[ ∞ n
[ n
P Ai = ∑ P(Ai) i.e. P lim Ai = lim ∑ P(Ai)
n→∞ n→∞
i=1 i=1 i=1 i=1
This obvious property could have been added to the axioms, but it follows from 2-3.
S ∪ 0/ ∪ 0/ ∪ 0/ ∪ · · · = S
From properties 2 and 3 (axioms 2 and 3) we have
∞ ∞
1 = P(S) = P(S ∪ 0/ ∪ 0/ ∪ 0/ ∪ · · ·) = P(S) + ∑ P(0)
/ = 1 + ∑ P(0)
/
i=2 i=2
it follows that ∑∞ / = 0 and thus P(0)
i=2 P(0) / = 0.
9
Countable Additivity =⇒ Finite Additivity
10
c
P(A ) = 1 − P(A) & A ⊂ B =⇒ P(A) ≤ P(B)
11
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
For any A, B ∈ C we have the pairwise disjoint decompositions (see Venn diagram)
A ∪ B = (A ∩ Bc) ∪ (A ∩ B) ∪ (B ∩ Ac)
12
Finite Sample Spaces
Suppose S = {s1, . . . , sN } is a finite sample space with outcomes s1, . . . , sN .
Assume that C is the collection of all subsets (events) of S.
and that we have a probability measure P defined for all A ∈ C .
This together with (1) defines a probability measure on C , satisfying axioms 1-3.
1 N N
pi = , i = 1, . . . , N Note pi ≥ 0 and ∑ pi = =1
N i=1 N
For such models the above event probability (1) becomes
1 #(A) # of cases favorable to A
P(A) = ∑ pi = ∑ N = #(S) = # of possible cases
si ∈A s ∈A
i
14
Dice Example
For a pair of symmetric dice it seems reasonable to assume that all 36 outcomes
in a proper roll of a pair of dice are equally likely.
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
.
.. . ... .
.. ... . .. ..
S= ... ... ... ... ... ...
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (6, 5), (5, 6)}
with #(A) = 8 and thus P(A) = 8/36 = 2/9.
Counting by full enumeration can become tedious and shortcuts are desirable.
15
Combinatorial Counts
40 patients in a preliminary drug trial are equally split between men and women.
We randomly split these 40 patients so that half get the new drug
while the other half get a look alike placebo.
What is the chance that the “new drug” patients are equally split between genders?
20 20
×
P(A) = 10 40 10 = choose(20, 10)ˆ2/choose(40, 20)
20
= dhyper(10, 20, 20, 20) = 0.2476289
See documentation in R on using choose and dhyper.
40 202
Enumerating 20 = 137, 846, 528, 820 and 10 = 184, 7562 cases is impractical.
16
Rolling 5 Fair Dice
a) What is the probability that all 5 top faces show the same number?
6 1 1
P(A) = 5 = 4 =
6 6 1296
b) What is the probability that the top faces show exactly 4 different numbers?
5
The duplicated number can occur on any one of the possible 2 pairs of dice
(order does not matter) & these two identical numbers can be any one of 6 values.
For the remaining 3 numbers we could have 5 × 4 × 3 possibilities. Thus
5
#(A) 6 · 2 · 5 · 4 · 3 25
P(A) = = 5
= = 0.462963
#(S) 6 54
17
Rolling 5 Fair Dice (continued)
c) What is the chance that the top faces show exactly three 6s or exactly two 5s?
Let A = event of seeing exactly three 6s and B = event of seeing exactly two 5s?
5 2 5 3 5
3 ·5 ·5
P(A) = 5
, P(B) = 2
5
, P(A ∩ B) = 35
6 6 6
5 2 5 3 5
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 3 ·5 + 2 ·5 − 3
65
250 + 1250 − 10 1490
= 5
= 5 ≈ 0.1916
6 6
The basic operating assumption is that all 365k birthday k-tuples (d1, . . . , dk )
are equally likely to have appeared in this class of k.
19
Matching or Adjacent Birthdays
What is the chance of having at least one matching or adjacent pair of birthdays?
Again, going to the complement is easier. View Dec. 31 and Jan. 1 as adjacent.
Let A be the event of getting n birthdays at least one day apart. Then we have
365 − n − 1 (n − 1)!365
P(A) =
n−1 365n
365 ways to pick a birthday for person 1. There are 365 − n non-birthdays (NB).
Use the remaining n − 1 birthdays (BD) to each fill one of the remaining 365 − n − 1
365−n−1
gaps between the non-birthdays, n−1 ways.
n = 14 gives the smallest n for which P(Ac) ≥ .5, in fact P(Ac) = .5375.
1.0
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.8
●
●
●
●
●
●
●
●
●
●
0.6
probability
●
●
●
●
●
●
●
●
0.4
●
●
●
●
●
●
0.2
●
●
●
●
●
●
P(at least one matching B−day)
●
●
●
P(at least one matching or adjacent B−day)
●
●
●
●
0.0
●
●
10 20 30 40 50
number of persons n
21
Conditional Probability
22
Conditional Probability: Formal Definition
The previous example with equally likely outcomes can be rewritten as
#(A ∩ B) #(A ∩ B)/#(S) P(A ∩ B)
P(A|B) = = =
#(S ∩ B) #(B)/#(S) P(B)
which motivates the following definition in the general case,
not restricted to equally likely outcomes
Definition: When A and B are any events with P(B) > 0, then we define the
conditional probability of A given B by
P(A ∩ B)
P(A|B) =
P(B)
This can be converted to the multiplication or product rule
23
Two Headed/Tailed Coins
Ask Marilyn: Suppose that we have three coins, one with two heads on it (HH),
one with two tails on it (TT) and a fair coin with head and tail (HT).
One of the coins is selected at random and flipped. Suppose the face up is Heads.
What is the chance that the other side is Heads as well?
24
Tree Diagram
1 1 1
coin=HH up=H down=H
3
1
3 1 1
1 up=H down=T
6
2
1
3
● coin=HT
1
1
3 1 1
2 up=T down=H
6
1 1 1
coin=TT up=T down=T
3
25
Applying the Multiplication Rule
1 1 1 1
= 1· + · =
3 2 3 2
P({up = H} ∩ {down = H})
P({down = H}|{up = H}) =
P({up = H})
The Venn diagram shows the possible outcomes for a randomly chosen person.
T+ T−
D D∩T+ D∩T−
Dc Dc ∩ T + Dc ∩ T −
27
+
P(D|T )?
Typically something is known about the prevalence HIV, say P(D) = 0.001.
We may also know P(T +|Dc) = 0.015 and P(T −|D) = .003,
the respective probabilities of a false positive and a false negative.
P(T +|D)P(D)
=
P(T +|D)P(D) + P(T +|Dc)P(Dc)
0.997 · 0.001
= = 0.06238
0.997 · 0.001 + 0.015 · 0.999
28
HIV Test Tree Diagram
T+ 0.001 × 0.997
0.997
D
0.001 0.003
T− 0.001 × 0.003
Dc
0.985
T− 0.999 × 0.985
29
Independence
The most natural way is to express this via conditional probabilities as follows:
P(A ∩ B) P(A ∩ B)
or = P(A) and = P(B)
P(B) P(A)
Definition: Two events A and B are independent if and only if
30
Comments on Independence
If P(A) = 0 or P(B) = 0 then A and B are independent.
If A ∩ B = 0/ , i.e., A and B are mutually exclusive, and P(A) > 0 and P(B) > 0, then
A and B cannot be independent.
The fact that A and B are spatially uncoupled in the Venn diagram does not mean
independence, on the contrary there is strong dependence between A and B
31
Implied Independence
If A and B are independent so are Ac and B and thus also Ac and Bc.
Proof:
32
Examples of Independence/Dependence
1. Given: P(A) = 0.4, P(B) = 0.5, and P([A ∪ B]c) = 0.3.
Are A and B independent?
2. Given: P(A ∩ Bc) = 0.3, P(Ac ∩ B) = 0.2, and P(Ac ∩ Bc) = 0.1.
Are A and B independent?
33
Postulated Independence
In practical applications independence is usually based on our understanding of
physical independence, i.e., A relates to one aspect of an experiment while B
relates to another aspect that is physically independent from the former.
Example: First flip a penny, then spin it, with apparent physical independence.
The sample space is S = {HH, HT, TH, TT}, with respective probabilities
34
Common Dependence Situations
P(B|A) P(B)
i.e., A and B are not independent.
35
Mutual Independence of a Collection {Aα} of Events
A collection {Aα} of events is said to consist of mutually independent events
if for any finite choice of events Aα1 , . . . , Aαk in {Aα} we have
P(A ∩ B) = P(A) · P(B), P(A ∩C) = P(A) · P(C), P(B ∩C) = P(B) · P(C)
The simplest and nontrivial example is illustrated by a coin toss: S = {H, T},
where we assign the number 1 to the outcome {H} and 0 to {T}.
Such an assignment can be viewed as a function X : S → R
H X 1
−→ with X(H) = 1 and X(T) = 0
T 0
Such a function is called a random variable. We use capital letters from the end of
the alphabet to denote such random variables (r.v.’s), e.g., U,V,W, X,Y, Z .
It seems that X only relabels the experimental outcomes, but there is more.
37
Random Variables for Two Coin Tosses
HH HT Y 2 1
−→ with Y (HH) = 2, Y (HT) = Y (TH) = 1, and Y (TT) = 0
TH TT 1 0
We may also assign a pair of numbers (X1, X2) to each of the outcomes as follows
38
Borel Sets
This sigma field, the Borel sets, is the smallest sigma field containing all intervals
(−∞, y] for y ∈ R, i.e., it contains all sets that can be obtained by complementation,
countable unions and intersections of such intervals, e.g., it contains intervals like
[a, b], [a, b), (a, b], (a, b) for any a, b ∈ R (why?)
Each r.v. X induces its own probability measure on the Borel sets B ∈ B .
39
Induced Events and Probabilities
Suppose we have a r.v. X : S −→ R with corresponding probability space (S, C , P).
For any Borel set B ∈ B we can determine the set X −1(B) of all outcomes in S
which get mapped into B, i.e.,
X −1(B) = {s ∈ S : X(s) ∈ B}
How do we know that X −1(B) ∈ C is an event? We don’t.
Thus we require it in our definition of a random variable.
40
Cumulative Distribution Function (CDF)
−1
PX ((−∞, y]) = P X ((−∞, y]) = P ({s ∈ S : X(s) ∈ (−∞, y]})
F(y) = P(X ≤ y)
41
CDF for Single Coin Toss or Coin Spin
The jump sizes at 0 and 1 represent 1 − P(H) = P(T) and P(H), respectively.
See CDF plots on next slide.
42
F(y) = P(X ≤ y) F(y) = P(X ≤ y)
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
−2
−2
−1
−1
●
●
●
●
0
0
y
y
●
●
●
●
1
1
2
2
3
3
CDF for Coin Toss/Coin Spin
43
2 Fair Coin Tosses
For two fair coin tosses the number X of heads takes the values 0,1,2
0.0 = P(0)
/ for y < 0
0.25 = P(X = 0) = P(TT) for 0 ≤ y < 1
P(X ≤ y) =
0.75 = P(X = 0 ∪ X = 1) = P(TT ∪ HT ∪ TH) for 1 ≤ y < 2
1.0 = P(X = 0 ∪ X = 1 ∪ X = 2) = P(S) for 2 ≤ y
44
CDF for 2 Fair Coin Tosses
1.0
●
0.8
● ●
F(y) = P(X ≤ y)
0.6
0.4
● ●
0.2
0.0
−2 −1 0 1 2 3
45
General CDF Properties
1. 0 ≤ F(y) ≤ 1 for all y ∈ R (F(y) is a probability)
46
Two Independent Random Variables
Two random variable X1 and X2 are independent if any events defined in terms of
X1 is independent of any event defined in terms of X2.
47
Independent Random Variables
48