Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
63 views49 pages

Stat 311 Probability

This document discusses different interpretations of probability and introduces the axiomatic approach. It contains the following key points: 1. There are frequentist and subjectivist interpretations of probability, but the axiomatic approach uses a sample space S, collection of events C, and probability measure P without committing to either interpretation. 2. C must satisfy properties like S being an event and countable unions being events. Such a C is a sigma-field. 3. P assigns numbers between 0 and 1 to events and satisfies axioms like countable additivity. This ensures properties like finite additivity and complement/subset relationships. 4. Examples are given of sample spaces and events for coin fl

Uploaded by

Aditya Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views49 pages

Stat 311 Probability

This document discusses different interpretations of probability and introduces the axiomatic approach. It contains the following key points: 1. There are frequentist and subjectivist interpretations of probability, but the axiomatic approach uses a sample space S, collection of events C, and probability measure P without committing to either interpretation. 2. C must satisfy properties like S being an event and countable unions being events. Such a C is a sigma-field. 3. P assigns numbers between 0 and 1 to events and satisfies axioms like countable additivity. This ensures properties like finite additivity and complement/subset relationships. 4. Examples are given of sample spaces and events for coin fl

Uploaded by

Aditya Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Elements of Statistical Methods

Probability (Ch 3)
Fritz Scholz

Spring Quarter 2010

April 5, 2010
Frequentist Interpretation of Probability
In common language usage probability can take on several different meanings.

E.g., there is the possibility, not necessarily certainty.

There is a 40% chance that it will rain tomorrow.


Given the current atmospheric conditions, under similar conditions in the past
it has rained the next day in about 40% of the cases.

or: about 40% of the rain gauges in the forcast area showed rain.

Both of these are examples of the frequentists’ notion of probability,


in the long run what proportion of instances will give the target result?

The 40% area forecast may need some thinking in that regard to make it fit.
1
Subjectivist Interpretation of Probability

You look out the window and say:


there is a 40% chance that it rains while I walk to the train station.

It is based on gut feeling (subjective) and internal vague memory of that person.

Having received a high PSA value for a prostate check


a man is 95% certain that the biopsy will show no cancer.

A woman has to decide whether to undergo an operation.


Based on medical experience (frequentist) it will be succesful in 40% of the cases.
But she feels she has a better than 80% chance of success.

It is my lucky day, I feel 70% certain that I will win something in the lottery.

2
Axiomatic Probability
Rather than deciding which interpretation is correct or to adopt, we stay on neutral
grounds and use the axiomatic probability model proposed by Kolmogorov (1933).

Both frequentists and subjectivists appear to accept this model as a basis


and it has evolved into a rich and useful theory.

It consists of three entities:

S: A sample space, a universe of all possible outcomes for an experiment.

C: A designated collection of observable subsets (called events) of S.

P: A probability measure, a function that assigns numbers (called probabilities)


to the events in C .

3
Examples
Flipping a coin twice we can distinguish the following four outcomes:
 
HH HT
S=
TH TT
As observable events we can take the collection of all subsets of S, i.e.,
 

 S, 0/ 

{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {TH, HT, TT}
 
C=

 {HH, HT}, {HH, TH}, {HH, TT}, {TH, HT}, {TH, TT}, {HT, TT} 

{HH}, {HT}, {TH}, {TT}
 

Measuring a person’s height. While we have some notion about such heights,
e.g., 1 foot ≤ height ≤ 9 feet, it is more convenient to us S = R, even though
most of these outcome values are not possible.

Observable events could be the collection of all intervals, plus what can be obtained
by set operations ∪, ∩ and c.
4
Observable Events

We said that C should contain the observable events.

What does observable mean?

When S is finite we can take the collection C of all subsets of S.

The same route can be taken with denumerable sample spaces S.

When S = R, it is no longer so easy to take all possible subsets of R as C .

We need to impose some axiomatic assumptions about C and P.

5
Collection of Events: Required Properties
Any collection C of events should satisfy the following properties:

1. The sample space S is an event, i.e., S ∈ C

2. If A is an event, i.e., A ∈ C , so is Ac, i.e., Ac ∈ C

3. For any countable sequence of events, A1, A2, A3 . . . ∈ C


their union should also be an event, i.e., i=1 Ai ∈ C .
S∞

Such a collection C with properties 1-3 is also called a sigma-field


or sigma-algebra.

By 1. and 2.: S ∈ C =⇒ Sc = 0/ ∈ C . C = {0,


/ S} is the simplest sigma-field.
6
Coin Flips Revisited
Suppose we cannot distinguish HT and TH. Then we have as sample space

S = {{H, H}, {H, T}, {T, T}}


We used set notation to describe the three elements,
order within a set is immaterial.

As sigma-field of all subsets we get


 
S, 0,/ { {H, H} }, { {H, T} }, { {TT} },
C=
{ {H, H}, {H, T} }, { {H, H}, {T, T} }, { {T, T}, {H, T} }

{ {H, H}, {H, T} }c = { {TT} }


The text treats this example within the context of the original sample space,
but introduces the indistinguishability of HT and TH by imposing the condition
that any event containing HT also contains TH. Compare the two models!

7
The Probability Measure P
A probability measure on C assigns a number P(A) to each event A ∈ C
and satisfies the following properties (axioms):

1. For any A ∈ C we have 0 ≤ P(A) ≤ 1.

2. P(S) = 1 The probability of some outcome in S happening is 1.

3. For any sequence A1, A2, A3, . . . of pairwise disjoint events we have
! !

[ ∞ n
[ n
P Ai = ∑ P(Ai) i.e. P lim Ai = lim ∑ P(Ai)
n→∞ n→∞
i=1 i=1 i=1 i=1

The third property is referred to as countable additivity.


8
/ =0
P(0)

This obvious property could have been added to the axioms, but it follows from 2-3.

Consider the specific sequence of pairwise disjoint events S, 0,


/ 0, / ...
/ 0,
and note that their infinite union is just S, i.e.,

S ∪ 0/ ∪ 0/ ∪ 0/ ∪ · · · = S
From properties 2 and 3 (axioms 2 and 3) we have
∞ ∞
1 = P(S) = P(S ∪ 0/ ∪ 0/ ∪ 0/ ∪ · · ·) = P(S) + ∑ P(0)
/ = 1 + ∑ P(0)
/
i=2 i=2
it follows that ∑∞ / = 0 and thus P(0)
i=2 P(0) / = 0.

9
Countable Additivity =⇒ Finite Additivity

Let A1, . . . , An be a finite sequence of pairwise disjoint events. Then


!
n
[ n
P (A1 ∪ . . . ∪ An) = P Ai = ∑ P(Ai) = P(A1) + . . . + P(An)
i=1 i=1
Proof:
Augment the sequence A1, . . . , An with an infinite number of 0/ ’s, then the
infinite sequence A1, . . . , An, 0, / . . . is pairwise disjoint and their union is
/ 0,
n
[
A1 ∪ . . . ∪ An ∪ 0/ ∪ 0/ ∪ . . . = Ai
i=1
/ = 0 we get
From axiom 3 together with P(0)

P(A1 ∪ . . . ∪ An ∪ 0/ ∪ 0/ ∪ . . .) = P(A1) + . . . + P(An) + P(0)


/ + P(0)
/ +...
P(A1 ∪ . . . ∪ An) = P(A1) + . . . + P(An)

10
c
P(A ) = 1 − P(A) & A ⊂ B =⇒ P(A) ≤ P(B)

In both proofs below we use the established finite additivity property.

S = A ∪ Ac 1 = P(S) = P(A) + P(Ac) =⇒ P(Ac) = 1 − P(A)

Assume A ⊂ B, then A and B ∩ Ac are mutually exclusive and their union is B

B = A ∪ (B ∩ Ac) =⇒ P(B) = P(A) + P(B ∩ Ac) ≥ P(A)


since P(B ∩ Ac) ≥ 0 by axiom 1.

11
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

For any A, B ∈ C we have the pairwise disjoint decompositions (see Venn diagram)

A ∪ B = (A ∩ Bc) ∪ (A ∩ B) ∪ (B ∩ Ac)

=⇒ P(A ∪ B) = P(A ∩ Bc) + P(A ∩ B) + P(B ∩ Ac)


and

A = (A ∩ Bc) ∪ (A ∩ B) and B = (A ∩ B) ∪ (B ∩ Ac)

=⇒ P(A) = P(A ∩ Bc) + P(A ∩ B) and P(B) = P(A ∩ B) + P(B ∩ Ac)

=⇒ P(A) + P(B) = P(A ∩ Bc) + 2P(A ∩ B) + P(B ∩ Ac)


= P(A ∪ B) + P(A ∩ B)
=⇒ P(A) + P(B) − P(A ∩ B) = P(A ∪ B)

12
Finite Sample Spaces
Suppose S = {s1, . . . , sN } is a finite sample space with outcomes s1, . . . , sN .
Assume that C is the collection of all subsets (events) of S.
and that we have a probability measure P defined for all A ∈ C .

Denote by pi = P({si}) the probability of the event {si}.


Then for any event A consisting of outcomes si1 , . . . , sik we have
   
k
[ [
P(A) = P  {si j } = P  {si} = ∑ P({si}) = ∑ pi (1)
j=1 si ∈A si ∈A si ∈A
The probabilities of the individual outcomes determine the probability of any event.

To specify P on C , we only need to specify p1, . . . , pN with 0 ≤ pi, i = 1, . . . , N


and p1 + . . . + pN = 1.

This together with (1) defines a probability measure on C , satisfying axioms 1-3.

This also works for denumerable S with 0 ≤ pi, i = 1, 2, . . . and ∑∞


i=1 pi = 1.
13
Equally Likely Outcomes

Many useful probability models concern N equally likely outcomes, i.e.,

1 N N
pi = , i = 1, . . . , N Note pi ≥ 0 and ∑ pi = =1
N i=1 N
For such models the above event probability (1) becomes
1 #(A) # of cases favorable to A
P(A) = ∑ pi = ∑ N = #(S) = # of possible cases
si ∈A s ∈A
i

Thus the calculation of probabilities is simply a matter of counting.

14
Dice Example

For a pair of symmetric dice it seems reasonable to assume that all 36 outcomes
in a proper roll of a pair of dice are equally likely.
 
 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) 
 .

.. . ... .
.. ... . .. ..


S= ... ... ... ... ... ...

 

(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
 

What is the chance of coming up with a 7 or 11?

A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (6, 5), (5, 6)}
with #(A) = 8 and thus P(A) = 8/36 = 2/9.

Counting by full enumeration can become tedious and shortcuts are desirable.

15
Combinatorial Counts
40 patients in a preliminary drug trial are equally split between men and women.

We randomly split these 40 patients so that half get the new drug
while the other half get a look alike placebo.

What is the chance that the “new drug” patients are equally split between genders?

“randomly split” means: any group of 20 out of 40 is equally likely to be selected.

20 20
×
P(A) = 10 40 10 = choose(20, 10)ˆ2/choose(40, 20)
20
= dhyper(10, 20, 20, 20) = 0.2476289
See documentation in R on using choose and dhyper.

40 202
Enumerating 20 = 137, 846, 528, 820 and 10 = 184, 7562 cases is impractical.
16
Rolling 5 Fair Dice

a) What is the probability that all 5 top faces show the same number?
6 1 1
P(A) = 5 = 4 =
6 6 1296
b) What is the probability that the top faces show exactly 4 different numbers?
5
The duplicated number can occur on any one of the possible 2 pairs of dice
(order does not matter) & these two identical numbers can be any one of 6 values.
For the remaining 3 numbers we could have 5 × 4 × 3 possibilities. Thus
5
#(A) 6 · 2 · 5 · 4 · 3 25
P(A) = = 5
= = 0.462963
#(S) 6 54

We repeatedly made use of the multiplication principle in counting the combinations


of the various choices with each other.

17
Rolling 5 Fair Dice (continued)
c) What is the chance that the top faces show exactly three 6s or exactly two 5s?

Let A = event of seeing exactly three 6s and B = event of seeing exactly two 5s?

5 2 5 3 5
3 ·5 ·5
P(A) = 5
, P(B) = 2
5
, P(A ∩ B) = 35
6 6 6

5 2 5 3 5
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 3 ·5 + 2 ·5 − 3
65
250 + 1250 − 10 1490
= 5
= 5 ≈ 0.1916
6 6

Sometimes you have to organize your counting in manageable chunks.


18
The Birthday Problem

Assuming a 365 day year and dealing with a group of k students,


what is the probability of having at least one birthday match among them?

The basic operating assumption is that all 365k birthday k-tuples (d1, . . . , dk )
are equally likely to have appeared in this class of k.

We will employ a useful trick. Sometimes it is easier to get your counting


arms around Ac and then employ P(A) = 1 − P(Ac).
Ac means that all k birthdays are different.

c 365 · 364 · · · (365 − (k − 1)) 365 · 364 · · · (366 − k)


P(A ) = & P(A) = 1 −
365k 365k

It takes just 23 students to get P(A) > .5.

19
Matching or Adjacent Birthdays
What is the chance of having at least one matching or adjacent pair of birthdays?
Again, going to the complement is easier. View Dec. 31 and Jan. 1 as adjacent.

Let A be the event of getting n birthdays at least one day apart. Then we have
 
365 − n − 1 (n − 1)!365
P(A) =
n−1 365n

(365 − 2n + 1)(365 − 2n + 2) · · · (365 − 2n + n − 1)


=
365n−1

365 ways to pick a birthday for person 1. There are 365 − n non-birthdays (NB).

Use the remaining n − 1 birthdays (BD) to each fill one of the remaining 365 − n − 1
365−n−1
gaps between the non-birthdays, n−1 ways.

That fixes the circular NB–BD pattern, anchored on the BD of person 1.

(n − 1)! ways to assigns these birthdays to the remaining (n − 1) persons.


20
P(M) and P(Ac)

n = 14 gives the smallest n for which P(Ac) ≥ .5, in fact P(Ac) = .5375.
1.0

● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●


● ●
● ●
● ●


● ●















0.8







0.6
probability






0.4





0.2





P(at least one matching B−day)



P(at least one matching or adjacent B−day)



0.0


10 20 30 40 50

number of persons n

21
Conditional Probability

S Conditional probabilities are a useful tool for


breaking down probability calculations into
● ● ●
manageable segments.
A The Venn diagram shows 10 equally likely
outcomes and two events A and B.
● ● ● ● #(A) 3
P(A) = = = 0.3
#(S) 10
B Suppose that we can restrict attention to the
outcomes in B as our new sample space, then
● ● ●
#(A ∩ B) 1
P(A|B) = = = 0.2
#(S ∩ B) 5
the conditional probability of A given B.

22
Conditional Probability: Formal Definition
The previous example with equally likely outcomes can be rewritten as
#(A ∩ B) #(A ∩ B)/#(S) P(A ∩ B)
P(A|B) = = =
#(S ∩ B) #(B)/#(S) P(B)
which motivates the following definition in the general case,
not restricted to equally likely outcomes

Definition: When A and B are any events with P(B) > 0, then we define the
conditional probability of A given B by

P(A ∩ B)
P(A|B) =
P(B)
This can be converted to the multiplication or product rule

P(A ∩ B) = P(A|B)P(B) and P(A ∩ B) = P(B ∩ A) = P(B|A)P(A)


provided that in the latter case we have P(A) > 0.

23
Two Headed/Tailed Coins
Ask Marilyn: Suppose that we have three coins, one with two heads on it (HH),
one with two tails on it (TT) and a fair coin with head and tail (HT).

One of the coins is selected at random and flipped. Suppose the face up is Heads.
What is the chance that the other side is Heads as well?

We could reason as follows:

1. Given the provided information, it can’t be the TT coin. It must be HH or HT.

2. If HH was selected face down is Heads, if HT then face down is Tails

3. Thus the chance of having Heads as face down is 1/2. Or is it?

24
Tree Diagram

1 1 1
coin=HH up=H down=H
3

1
3 1 1
1 up=H down=T
6
2
1
3
● coin=HT

1
1
3 1 1
2 up=T down=H
6

1 1 1
coin=TT up=T down=T
3

25
Applying the Multiplication Rule

P({up = H}) = P({{up = H} ∩ {coin = HH}} ∪ {{up = H} ∩ {coin = HT}})

= P({up = H} ∩ {coin = HH}) + P({up = H} ∩ {coin = HT})

= P({up = H}|{coin = HH}) · P({coin = HH})

+ P({up = H}|{coin = HT}) · P({coin = HT})

1 1 1 1
= 1· + · =
3 2 3 2
P({up = H} ∩ {down = H})
P({down = H}|{up = H}) =
P({up = H})

P({coin = HH}) 1/3 2 1


= = = 6=
1/2 1/2 3 2
26
HIV Screening
A population can be divided into those that have HIV (denoted by D)
and those who do not (denoted by Dc).

A test either results in a positive, denoted by T +, or in a negative, denoted by T −.

The Venn diagram shows the possible outcomes for a randomly chosen person.

T+ T−

D D∩T+ D∩T−

Dc Dc ∩ T + Dc ∩ T −

Test is correct: D ∩ T + or Dc ∩ T −; false positive: Dc ∩ T +; false negative: D ∩ T −.

27
+
P(D|T )?
Typically something is known about the prevalence HIV, say P(D) = 0.001.

We may also know P(T +|Dc) = 0.015 and P(T −|D) = .003,
the respective probabilities of a false positive and a false negative.

P(D ∩ T +) P(D ∩ T +) P(D ∩ T +)


P(D|T +) = = =
P(T +) P({T + ∩ D} ∪ {T + ∩ Dc}) P(T + ∩ D) + P(T + ∩ Dc)

P(T +|D)P(D)
=
P(T +|D)P(D) + P(T +|Dc)P(Dc)

0.997 · 0.001
= = 0.06238
0.997 · 0.001 + 0.015 · 0.999

28
HIV Test Tree Diagram
T+ 0.001 × 0.997
0.997

D
0.001 0.003
T− 0.001 × 0.003

0.999 0.015 T+ 0.999 × 0.015

Dc

0.985

T− 0.999 × 0.985

P(D ∩ T+) 0.001 × 0.997


P(D | T
+
)= = = 0.06238268
P(T+) 0.001 × 0.997 + 0.999 × 0.015

29
Independence

The concept of independence is of great importance in probability and statistics.

Informally: Two events are independent if the probability of occurrence of either


is unaffected by the occurrence of the other.

The most natural way is to express this via conditional probabilities as follows:

P(A|B) = P(A) and P(B|A) = P(B)

P(A ∩ B) P(A ∩ B)
or = P(A) and = P(B)
P(B) P(A)
Definition: Two events A and B are independent if and only if

P(A ∩ B) = P(A) · P(B)


Note that P(A) > 0 and P(B) > 0 are not required (as in P(B|A) and P(A|B)).

30
Comments on Independence
If P(A) = 0 or P(B) = 0 then A and B are independent.

Since A ∩ B ⊂ A and A ∩ B ⊂ B =⇒ 0 ≤ P(A ∩ B) ≤ min(P(A), P(B)) = 0, thus

0 = P(A ∩ B) = P(A) · P(B) = 0

If A ∩ B = 0/ , i.e., A and B are mutually exclusive, and P(A) > 0 and P(B) > 0, then
A and B cannot be independent.
The fact that A and B are spatially uncoupled in the Venn diagram does not mean
independence, on the contrary there is strong dependence between A and B

because P(A ∩ B) = 0 < P(A) · P(B)


or, knowing that A occurred, leaves no chance for B to occur (strong impact).
Thus A and Ac are not independent as long as 0 < P(A) < 1.

31
Implied Independence

If A and B are independent so are Ac and B and thus also Ac and Bc.

Proof:

P(B) = P(B ∩ A) + P(B ∩ Ac) = P(B) · P(A) + P(B ∩ Ac)

=⇒ P(B) · (1 − P(A)) = P(B ∩ Ac) =⇒ P(B ∩ Ac) = P(B)P(Ac)

32
Examples of Independence/Dependence
1. Given: P(A) = 0.4, P(B) = 0.5, and P([A ∪ B]c) = 0.3.
Are A and B independent?

P(A ∪ B) = 0.7 = P(A) + P(B) − P(A ∩ B) = 0.4 + 0.5 − P(A ∩ B)

=⇒ P(A ∩ B) = 0.2 = P(A) · P(B) =⇒ A and B are independent!

2. Given: P(A ∩ Bc) = 0.3, P(Ac ∩ B) = 0.2, and P(Ac ∩ Bc) = 0.1.
Are A and B independent?

0.1 = P(Ac ∩ Bc) = P([A ∪ B]c) = 1 − P(A ∪ B) =⇒ P(A ∪ B) = 0.9

0.9 = P(A ∪ B) = P(A ∩ Bc) + P(Ac ∩ B) + P(A ∩ B) = 0.3 + 0.2 + P(A ∩ B)

=⇒ P(A ∩ B) = 0.4 P(A) = 0.7 P(B) = 0.6


and P(A ∩ B) = 0.4 6= P(A) · P(B) = 0.42, i.e., A and B are dependent.

33
Postulated Independence
In practical applications independence is usually based on our understanding of
physical independence, i.e., A relates to one aspect of an experiment while B
relates to another aspect that is physically independent from the former.

In such cases we postulate probability models which reflect this independence.

Example: First flip a penny, then spin it, with apparent physical independence.
The sample space is S = {HH, HT, TH, TT}, with respective probabilities

p1 · p2, p1 · (1 − p2), (1 − p1) · p2, (1 − p1) · (1 − p2)

where P(H on flip) = P({HT} ∪ {HH}) = p1 · p2 + p1 · (1 − p2) = p1


and P(H on spin) = P({TH} ∪ {HH}) = (1 − p1) · p2 + p1 · p2 = p2

and P({H on flip} ∩ {H on spin}) = P({HH}) = p1 p2 = P(H on flip) · P(H on spin)

34
Common Dependence Situations

1. Consider the population of undergraduates at William & Mary, from which a


student is selected at random. Let A be the event that the student is female,
B be the event that the student is heading for elementary education.

Being told P(A) ≈ .6 and P(A|B) ≥ .9 =⇒ A and B are not independent.

2. Select a person at random from a population of registered voters.


Let A be the event that the person belongs to a country club,
B be the event that the person is a Republican. We probably would expect

P(B|A)  P(B)
i.e., A and B are not independent.

35
Mutual Independence of a Collection {Aα} of Events
A collection {Aα} of events is said to consist of mutually independent events
if for any finite choice of events Aα1 , . . . , Aαk in {Aα} we have

P(Aα1 ∩ . . . ∩ Aαk ) = P(Aα1 ) · . . . · P(Aαk )

For example, for 3 events A, B, C, we not only require

P(A ∩ B) = P(A) · P(B), P(A ∩C) = P(A) · P(C), P(B ∩C) = P(B) · P(C)

but also P(A ∩ B ∩C) = P(A) · P(B) · P(C) (2)

Pairwise independence does not necessarily imply (2).

Counterexample: Flip 2 fair coins. Let A = {H on 1st flip}, B = {H on 2nd flip},


C = {same result on both flips} with P(A) = P(B) = P(C) = 12 and
P(A ∩C) = P(HH) = 14 , etc., but P(A ∩ B ∩C) = P(HH) = 14 6= 18 .
−→ text example on “independence” of 3 blood markers (O.J. Simpson trial).
36
Random Variables
In many experiments the focus is on numbers assigned to the various outcomes.
Numbers → arithmetic and common arena for understanding experimental results.

The simplest and nontrivial example is illustrated by a coin toss: S = {H, T},
where we assign the number 1 to the outcome {H} and 0 to {T}.
Such an assignment can be viewed as a function X : S → R

H X 1
−→ with X(H) = 1 and X(T) = 0
T 0
Such a function is called a random variable. We use capital letters from the end of
the alphabet to denote such random variables (r.v.’s), e.g., U,V,W, X,Y, Z .

Using the word “variable” to denote a function is somewhat unfortunate,


but it is customary. It emphasizes the varying values that X can take on
as a result of the (random) experiment.

It seems that X only relabels the experimental outcomes, but there is more.
37
Random Variables for Two Coin Tosses

Toss a coin twice.


Assign the number of heads to each outcome in S = {HH, HT, TH, TT}. Y : S → R

HH HT Y 2 1
−→ with Y (HH) = 2, Y (HT) = Y (TH) = 1, and Y (TT) = 0
TH TT 1 0
We may also assign a pair of numbers (X1, X2) to each of the outcomes as follows

X1(HH) = 1, X1(HT) = 1, X1(TH) = 0, X1(TT) = 0

X2(HH) = 1, X2(HT) = 0, X2(TH) = 1, X2(TT) = 0


X1 = # of heads on the first toss and X2 = # of heads on the second toss.

X = (X1, X2) is called a random vector (of length 2).

We can express Y also as Y = X1 + X2 = g(X1, X2) with g(x1, x2) = x1 + x2.

38
Borel Sets

A random variable X induces a probability measure on a sigma field B of certain


subsets of R = (−∞, ∞).

This sigma field, the Borel sets, is the smallest sigma field containing all intervals
(−∞, y] for y ∈ R, i.e., it contains all sets that can be obtained by complementation,
countable unions and intersections of such intervals, e.g., it contains intervals like

[a, b], [a, b), (a, b], (a, b) for any a, b ∈ R (why?)

It takes a lot gyrations to construct a set that is not a Borel set.


We won’t see any in this course.

How do we assign probabilities to such Borel sets B ∈ B ?

Each r.v. X induces its own probability measure on the Borel sets B ∈ B .
39
Induced Events and Probabilities
Suppose we have a r.v. X : S −→ R with corresponding probability space (S, C , P).

For any Borel set B ∈ B we can determine the set X −1(B) of all outcomes in S
which get mapped into B, i.e.,

X −1(B) = {s ∈ S : X(s) ∈ B}
How do we know that X −1(B) ∈ C is an event? We don’t.
Thus we require it in our definition of a random variable.

Definition: A function X : S −→ R is a random variable if and only if the

induced event X −1((−∞, y]) ∈ C for any y ∈ R

and thus the induced probability PX (induced by P and X )

PX ((−∞, y]) = P({s ∈ S : X(s) ≤ y}) exists for all y ∈ R

40
Cumulative Distribution Function (CDF)

A variety of ways of expressing the same probability (relaxed and fastidious):

 
−1
PX ((−∞, y]) = P X ((−∞, y]) = P ({s ∈ S : X(s) ∈ (−∞, y]})

= P(−∞ < X ≤ y) = P(X ≤ y) (most relaxed)

Definition: The cumulative distribution function (cdf) of a random variable X


is the function F : R −→ [0, 1] defined by

F(y) = P(X ≤ y)

41
CDF for Single Coin Toss or Coin Spin

Example (coin toss P(H) = 0.5):


Since X takes only the values 1 and 0 for H and T we have

 0.0 = P(0)
/ for y < 0
P(X ≤ y) = 0.5 = P(X = 0) = P(T) for 0 ≤ y < 1
1.0 = P(X = 0 ∪ X = 1) = P(T ∪ H) for 1 ≤ y

Example (coin spin P(H) = 0.3):



 0.0 = P(0)
/ for y < 0
P(X ≤ y) = 0.7 = P(X = 0) = P(T) for 0 ≤ y < 1
1.0 = P(X = 0 ∪ X = 1) = P(T ∪ H) for 1 ≤ y

The jump sizes at 0 and 1 represent 1 − P(H) = P(T) and P(H), respectively.
See CDF plots on next slide.

42
F(y) = P(X ≤ y) F(y) = P(X ≤ y)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

−2
−2

−1
−1




0
0

y
y




1
1

2
2

3
3
CDF for Coin Toss/Coin Spin

43
2 Fair Coin Tosses

For two fair coin tosses the number X of heads takes the values 0,1,2

for s = TT, s = HT or s = TH, and s = HH with probabilities 14 , 21 , 41 , respectively.



 0.0 = P(0)
/ for y < 0

 0.25 = P(X = 0) = P(TT) for 0 ≤ y < 1

P(X ≤ y) =


 0.75 = P(X = 0 ∪ X = 1) = P(TT ∪ HT ∪ TH) for 1 ≤ y < 2

1.0 = P(X = 0 ∪ X = 1 ∪ X = 2) = P(S) for 2 ≤ y

See CDF plot on next slide.

44
CDF for 2 Fair Coin Tosses
1.0


0.8

● ●
F(y) = P(X ≤ y)

0.6
0.4

● ●
0.2
0.0

−2 −1 0 1 2 3

45
General CDF Properties
1. 0 ≤ F(y) ≤ 1 for all y ∈ R (F(y) is a probability)

2. y1 ≤ y2 =⇒ F(y1) ≤ F(y2) (monotonicity property)


This follows since {X ≤ y1} ⊂ {X ≤ y2} =⇒ P(X ≤ y1) ≤ P(X ≤ y2).

3. Limiting behavior as we approach ±∞

lim F(y) = 0 and lim F(y) = 1


y→−∞ y→∞
This follows (with some more attention to technical detail) since
\ [
lim {X ≤ y} = {X ≤ y} = 0/ and lim {X ≤ y} = {X ≤ y} = S
y→−∞ y→∞
y→−∞ y→∞
Note that in our examples we had F(y) = 0 for sufficiently low y (y < 0) and
F(y) = 1 for sufficiently high y. X had a finite and thus bounded value set.

46
Two Independent Random Variables
Two random variable X1 and X2 are independent if any events defined in terms of
X1 is independent of any event defined in terms of X2.

The following weaker but more practical definition is equivalent.

Definition: Let X1 : S −→ R and X2 : S −→ R be random variables defined on the


same sample space S. X1 and X2 are independent if and only if for each y1 ∈ R
and y2 ∈ R

P(X1 ≤ y1, X2 ≤ y2) = P(X1 ≤ y1) · P(X2 ≤ y2)


Note the shorthand notation

P(X1 ≤ y1, X2 ≤ y2) = P({X1 ≤ y1} ∩ {X2 ≤ y2})


You also often see P(AB) for P(A ∩ B).

47
Independent Random Variables

A collection of random variable {Xα} is mutually independent if the above product


property holds for any finite subsets of these random variables, i.e., for any integer
k ≥ 2 and finite index subset α1, . . . , αk we have for all y1, . . . , yk ∈ R

P(Xα1 ≤ y1, . . . , Xαk ≤ yk ) = P(Xα1 ≤ y1) · . . . · P(Xαk ≤ yk )

Whether the independence assumption is appropriate in a given application


is mainly a matter of judgment or common sense.

With independence we have access to many powerful and useful theorems


in probability and statistics.

48

You might also like