0% found this document useful (0 votes)

63 views49 pages

Stat 311 Probability

This document discusses different interpretations of probability and introduces the axiomatic approach. It contains the following key points: 1. There are frequentist and subjectivist interpretations of probability, but the axiomatic approach uses a sample space S, collection of events C, and probability measure P without committing to either interpretation. 2. C must satisfy properties like S being an event and countable unions being events. Such a C is a sigma-field. 3. P assigns numbers between 0 and 1 to events and satisfies axioms like countable additivity. This ensures properties like finite additivity and complement/subset relationships. 4. Examples are given of sample spaces and events for coin fl

Uploaded by

Aditya Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views49 pages

Stat 311 Probability

Uploaded by

Aditya Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Elements of Statistical Methods

Probability (Ch 3)
Fritz Scholz

Spring Quarter 2010

April 5, 2010
Frequentist Interpretation of Probability
In common language usage probability can take on several different meanings.

E.g., there is the possibility, not necessarily certainty.

There is a 40% chance that it will rain tomorrow.

Given the current atmospheric conditions, under similar conditions in the past
it has rained the next day in about 40% of the cases.

or: about 40% of the rain gauges in the forcast area showed rain.

Both of these are examples of the frequentists’ notion of probability,

in the long run what proportion of instances will give the target result?

The 40% area forecast may need some thinking in that regard to make it fit.
1
Subjectivist Interpretation of Probability

You look out the window and say:

there is a 40% chance that it rains while I walk to the train station.

It is based on gut feeling (subjective) and internal vague memory of that person.

Having received a high PSA value for a prostate check

a man is 95% certain that the biopsy will show no cancer.

A woman has to decide whether to undergo an operation.

Based on medical experience (frequentist) it will be succesful in 40% of the cases.
But she feels she has a better than 80% chance of success.

It is my lucky day, I feel 70% certain that I will win something in the lottery.

2
Axiomatic Probability
Rather than deciding which interpretation is correct or to adopt, we stay on neutral
grounds and use the axiomatic probability model proposed by Kolmogorov (1933).

Both frequentists and subjectivists appear to accept this model as a basis

and it has evolved into a rich and useful theory.

It consists of three entities:

S: A sample space, a universe of all possible outcomes for an experiment.

C: A designated collection of observable subsets (called events) of S.

P: A probability measure, a function that assigns numbers (called probabilities)

to the events in C .

3
Examples
Flipping a coin twice we can distinguish the following four outcomes:

HH HT
S=
TH TT
As observable events we can take the collection of all subsets of S, i.e.,
 

 S, 0/ 

{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {TH, HT, TT}
 
C=

 {HH, HT}, {HH, TH}, {HH, TT}, {TH, HT}, {TH, TT}, {HT, TT} 

{HH}, {HT}, {TH}, {TT}
 

Measuring a person’s height. While we have some notion about such heights,
e.g., 1 foot ≤ height ≤ 9 feet, it is more convenient to us S = R, even though
most of these outcome values are not possible.

Observable events could be the collection of all intervals, plus what can be obtained
by set operations ∪, ∩ and c.
4
Observable Events

We said that C should contain the observable events.

What does observable mean?

When S is finite we can take the collection C of all subsets of S.

The same route can be taken with denumerable sample spaces S.

When S = R, it is no longer so easy to take all possible subsets of R as C .

We need to impose some axiomatic assumptions about C and P.

5
Collection of Events: Required Properties
Any collection C of events should satisfy the following properties:

1. The sample space S is an event, i.e., S ∈ C

2. If A is an event, i.e., A ∈ C , so is Ac, i.e., Ac ∈ C

3. For any countable sequence of events, A1, A2, A3 . . . ∈ C

their union should also be an event, i.e., i=1 Ai ∈ C .
S∞

Such a collection C with properties 1-3 is also called a sigma-field

or sigma-algebra.

By 1. and 2.: S ∈ C =⇒ Sc = 0/ ∈ C . C = {0,

/ S} is the simplest sigma-field.
6
Coin Flips Revisited
Suppose we cannot distinguish HT and TH. Then we have as sample space

S = {{H, H}, {H, T}, {T, T}}

We used set notation to describe the three elements,
order within a set is immaterial.

As sigma-field of all subsets we get

S, 0,/ { {H, H} }, { {H, T} }, { {TT} },
C=
{ {H, H}, {H, T} }, { {H, H}, {T, T} }, { {T, T}, {H, T} }

{ {H, H}, {H, T} }c = { {TT} }

The text treats this example within the context of the original sample space,
but introduces the indistinguishability of HT and TH by imposing the condition
that any event containing HT also contains TH. Compare the two models!

7
The Probability Measure P
A probability measure on C assigns a number P(A) to each event A ∈ C
and satisfies the following properties (axioms):

1. For any A ∈ C we have 0 ≤ P(A) ≤ 1.

2. P(S) = 1 The probability of some outcome in S happening is 1.

3. For any sequence A1, A2, A3, . . . of pairwise disjoint events we have
! !
∞
[ ∞ n
[ n
P Ai = ∑ P(Ai) i.e. P lim Ai = lim ∑ P(Ai)
n→∞ n→∞
i=1 i=1 i=1 i=1

The third property is referred to as countable additivity.

8
/ =0
P(0)

This obvious property could have been added to the axioms, but it follows from 2-3.

Consider the specific sequence of pairwise disjoint events S, 0,

/ 0, / ...
/ 0,
and note that their infinite union is just S, i.e.,

S ∪ 0/ ∪ 0/ ∪ 0/ ∪ · · · = S
From properties 2 and 3 (axioms 2 and 3) we have
∞ ∞
1 = P(S) = P(S ∪ 0/ ∪ 0/ ∪ 0/ ∪ · · ·) = P(S) + ∑ P(0)
/ = 1 + ∑ P(0)
/
i=2 i=2
it follows that ∑∞ / = 0 and thus P(0)
i=2 P(0) / = 0.

9
Countable Additivity =⇒ Finite Additivity

Let A1, . . . , An be a finite sequence of pairwise disjoint events. Then

!
n
[ n
P (A1 ∪ . . . ∪ An) = P Ai = ∑ P(Ai) = P(A1) + . . . + P(An)
i=1 i=1
Proof:
Augment the sequence A1, . . . , An with an infinite number of 0/ ’s, then the
infinite sequence A1, . . . , An, 0, / . . . is pairwise disjoint and their union is
/ 0,
n
[
A1 ∪ . . . ∪ An ∪ 0/ ∪ 0/ ∪ . . . = Ai
i=1
/ = 0 we get
From axiom 3 together with P(0)

P(A1 ∪ . . . ∪ An ∪ 0/ ∪ 0/ ∪ . . .) = P(A1) + . . . + P(An) + P(0)

/ + P(0)
/ +...
P(A1 ∪ . . . ∪ An) = P(A1) + . . . + P(An)

10
c
P(A ) = 1 − P(A) & A ⊂ B =⇒ P(A) ≤ P(B)

In both proofs below we use the established finite additivity property.

S = A ∪ Ac 1 = P(S) = P(A) + P(Ac) =⇒ P(Ac) = 1 − P(A)

Assume A ⊂ B, then A and B ∩ Ac are mutually exclusive and their union is B

B = A ∪ (B ∩ Ac) =⇒ P(B) = P(A) + P(B ∩ Ac) ≥ P(A)

since P(B ∩ Ac) ≥ 0 by axiom 1.

11
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

For any A, B ∈ C we have the pairwise disjoint decompositions (see Venn diagram)

A ∪ B = (A ∩ Bc) ∪ (A ∩ B) ∪ (B ∩ Ac)

=⇒ P(A ∪ B) = P(A ∩ Bc) + P(A ∩ B) + P(B ∩ Ac)

and

A = (A ∩ Bc) ∪ (A ∩ B) and B = (A ∩ B) ∪ (B ∩ Ac)

=⇒ P(A) = P(A ∩ Bc) + P(A ∩ B) and P(B) = P(A ∩ B) + P(B ∩ Ac)

=⇒ P(A) + P(B) = P(A ∩ Bc) + 2P(A ∩ B) + P(B ∩ Ac)

= P(A ∪ B) + P(A ∩ B)
=⇒ P(A) + P(B) − P(A ∩ B) = P(A ∪ B)

12
Finite Sample Spaces
Suppose S = {s1, . . . , sN } is a finite sample space with outcomes s1, . . . , sN .
Assume that C is the collection of all subsets (events) of S.
and that we have a probability measure P defined for all A ∈ C .

Denote by pi = P({si}) the probability of the event {si}.

Then for any event A consisting of outcomes si1 , . . . , sik we have
   
k
[ [
P(A) = P  {si j } = P  {si} = ∑ P({si}) = ∑ pi (1)
j=1 si ∈A si ∈A si ∈A
The probabilities of the individual outcomes determine the probability of any event.

To specify P on C , we only need to specify p1, . . . , pN with 0 ≤ pi, i = 1, . . . , N

and p1 + . . . + pN = 1.

This together with (1) defines a probability measure on C , satisfying axioms 1-3.

This also works for denumerable S with 0 ≤ pi, i = 1, 2, . . . and ∑∞

i=1 pi = 1.
13
Equally Likely Outcomes

Many useful probability models concern N equally likely outcomes, i.e.,

1 N N
pi = , i = 1, . . . , N Note pi ≥ 0 and ∑ pi = =1
N i=1 N
For such models the above event probability (1) becomes
1 #(A) # of cases favorable to A
P(A) = ∑ pi = ∑ N = #(S) = # of possible cases
si ∈A s ∈A
i

Thus the calculation of probabilities is simply a matter of counting.

14
Dice Example

For a pair of symmetric dice it seems reasonable to assume that all 36 outcomes
in a proper roll of a pair of dice are equally likely.
 
 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) 
 .

.. . ... .
.. ... . .. ..


S= ... ... ... ... ... ...

 

(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
 

What is the chance of coming up with a 7 or 11?

A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (6, 5), (5, 6)}
with #(A) = 8 and thus P(A) = 8/36 = 2/9.

Counting by full enumeration can become tedious and shortcuts are desirable.

15
Combinatorial Counts
40 patients in a preliminary drug trial are equally split between men and women.

We randomly split these 40 patients so that half get the new drug
while the other half get a look alike placebo.

What is the chance that the “new drug” patients are equally split between genders?

“randomly split” means: any group of 20 out of 40 is equally likely to be selected.

20 20
×
P(A) = 10 40 10 = choose(20, 10)ˆ2/choose(40, 20)
20
= dhyper(10, 20, 20, 20) = 0.2476289
See documentation in R on using choose and dhyper.

40 202
Enumerating 20 = 137, 846, 528, 820 and 10 = 184, 7562 cases is impractical.
16
Rolling 5 Fair Dice

a) What is the probability that all 5 top faces show the same number?
6 1 1
P(A) = 5 = 4 =
6 6 1296
b) What is the probability that the top faces show exactly 4 different numbers?
5
The duplicated number can occur on any one of the possible 2 pairs of dice
(order does not matter) & these two identical numbers can be any one of 6 values.
For the remaining 3 numbers we could have 5 × 4 × 3 possibilities. Thus
5
#(A) 6 · 2 · 5 · 4 · 3 25
P(A) = = 5
= = 0.462963
#(S) 6 54

We repeatedly made use of the multiplication principle in counting the combinations

of the various choices with each other.

17
Rolling 5 Fair Dice (continued)
c) What is the chance that the top faces show exactly three 6s or exactly two 5s?

Let A = event of seeing exactly three 6s and B = event of seeing exactly two 5s?

5 2 5 3 5
3 ·5 ·5
P(A) = 5
, P(B) = 2
5
, P(A ∩ B) = 35
6 6 6

5 2 5 3 5
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 3 ·5 + 2 ·5 − 3
65
250 + 1250 − 10 1490
= 5
= 5 ≈ 0.1916
6 6

Sometimes you have to organize your counting in manageable chunks.

18
The Birthday Problem

Assuming a 365 day year and dealing with a group of k students,

what is the probability of having at least one birthday match among them?

The basic operating assumption is that all 365k birthday k-tuples (d1, . . . , dk )
are equally likely to have appeared in this class of k.

We will employ a useful trick. Sometimes it is easier to get your counting

arms around Ac and then employ P(A) = 1 − P(Ac).
Ac means that all k birthdays are different.

c 365 · 364 · · · (365 − (k − 1)) 365 · 364 · · · (366 − k)

P(A ) = & P(A) = 1 −
365k 365k

It takes just 23 students to get P(A) > .5.

19
Matching or Adjacent Birthdays
What is the chance of having at least one matching or adjacent pair of birthdays?
Again, going to the complement is easier. View Dec. 31 and Jan. 1 as adjacent.

Let A be the event of getting n birthdays at least one day apart. Then we have

365 − n − 1 (n − 1)!365
P(A) =
n−1 365n

(365 − 2n + 1)(365 − 2n + 2) · · · (365 − 2n + n − 1)

=
365n−1

365 ways to pick a birthday for person 1. There are 365 − n non-birthdays (NB).

Use the remaining n − 1 birthdays (BD) to each fill one of the remaining 365 − n − 1
365−n−1
gaps between the non-birthdays, n−1 ways.

That fixes the circular NB–BD pattern, anchored on the BD of person 1.

(n − 1)! ways to assigns these birthdays to the remaining (n − 1) persons.

20
P(M) and P(Ac)

n = 14 gives the smallest n for which P(Ac) ≥ .5, in fact P(Ac) = .5375.
1.0

● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
● ●

●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●

●
●

●
●
0.8

●
●

●
●
0.6
probability

●
●

●
●
0.4

●
●

●
●
0.2

●
●

●
●
P(at least one matching B−day)
●
●
●
P(at least one matching or adjacent B−day)
●
●

●
●
0.0

●
●

10 20 30 40 50

number of persons n

21
Conditional Probability

S Conditional probabilities are a useful tool for

breaking down probability calculations into
● ● ●
manageable segments.
A The Venn diagram shows 10 equally likely
outcomes and two events A and B.
● ● ● ● #(A) 3
P(A) = = = 0.3
#(S) 10
B Suppose that we can restrict attention to the
outcomes in B as our new sample space, then
● ● ●
#(A ∩ B) 1
P(A|B) = = = 0.2
#(S ∩ B) 5
the conditional probability of A given B.

22
Conditional Probability: Formal Definition
The previous example with equally likely outcomes can be rewritten as
#(A ∩ B) #(A ∩ B)/#(S) P(A ∩ B)
P(A|B) = = =
#(S ∩ B) #(B)/#(S) P(B)
which motivates the following definition in the general case,
not restricted to equally likely outcomes

Definition: When A and B are any events with P(B) > 0, then we define the
conditional probability of A given B by

P(A ∩ B)
P(A|B) =
P(B)
This can be converted to the multiplication or product rule

P(A ∩ B) = P(A|B)P(B) and P(A ∩ B) = P(B ∩ A) = P(B|A)P(A)

provided that in the latter case we have P(A) > 0.

23
Two Headed/Tailed Coins
Ask Marilyn: Suppose that we have three coins, one with two heads on it (HH),
one with two tails on it (TT) and a fair coin with head and tail (HT).

One of the coins is selected at random and flipped. Suppose the face up is Heads.
What is the chance that the other side is Heads as well?

We could reason as follows:

1. Given the provided information, it can’t be the TT coin. It must be HH or HT.

2. If HH was selected face down is Heads, if HT then face down is Tails

3. Thus the chance of having Heads as face down is 1/2. Or is it?

24
Tree Diagram

1 1 1
coin=HH up=H down=H
3

1
3 1 1
1 up=H down=T
6
2
1
3
● coin=HT

1
1
3 1 1
2 up=T down=H
6

1 1 1
coin=TT up=T down=T
3

25
Applying the Multiplication Rule

P({up = H}) = P({{up = H} ∩ {coin = HH}} ∪ {{up = H} ∩ {coin = HT}})

= P({up = H} ∩ {coin = HH}) + P({up = H} ∩ {coin = HT})

= P({up = H}|{coin = HH}) · P({coin = HH})

+ P({up = H}|{coin = HT}) · P({coin = HT})

1 1 1 1
= 1· + · =
3 2 3 2
P({up = H} ∩ {down = H})
P({down = H}|{up = H}) =
P({up = H})

P({coin = HH}) 1/3 2 1

= = = 6=
1/2 1/2 3 2
26
HIV Screening
A population can be divided into those that have HIV (denoted by D)
and those who do not (denoted by Dc).

A test either results in a positive, denoted by T +, or in a negative, denoted by T −.

The Venn diagram shows the possible outcomes for a randomly chosen person.

T+ T−

D D∩T+ D∩T−

Dc Dc ∩ T + Dc ∩ T −

Test is correct: D ∩ T + or Dc ∩ T −; false positive: Dc ∩ T +; false negative: D ∩ T −.

27
+
P(D|T )?
Typically something is known about the prevalence HIV, say P(D) = 0.001.

We may also know P(T +|Dc) = 0.015 and P(T −|D) = .003,
the respective probabilities of a false positive and a false negative.

P(D ∩ T +) P(D ∩ T +) P(D ∩ T +)

P(D|T +) = = =
P(T +) P({T + ∩ D} ∪ {T + ∩ Dc}) P(T + ∩ D) + P(T + ∩ Dc)

P(T +|D)P(D)
=
P(T +|D)P(D) + P(T +|Dc)P(Dc)

0.997 · 0.001
= = 0.06238
0.997 · 0.001 + 0.015 · 0.999

28
HIV Test Tree Diagram
T+ 0.001 × 0.997
0.997

D
0.001 0.003
T− 0.001 × 0.003

0.999 0.015 T+ 0.999 × 0.015

0.985

T− 0.999 × 0.985

P(D ∩ T+) 0.001 × 0.997

P(D | T
+
)= = = 0.06238268
P(T+) 0.001 × 0.997 + 0.999 × 0.015

29
Independence

The concept of independence is of great importance in probability and statistics.

Informally: Two events are independent if the probability of occurrence of either

is unaffected by the occurrence of the other.

The most natural way is to express this via conditional probabilities as follows:

P(A|B) = P(A) and P(B|A) = P(B)

P(A ∩ B) P(A ∩ B)
or = P(A) and = P(B)
P(B) P(A)
Definition: Two events A and B are independent if and only if

P(A ∩ B) = P(A) · P(B)

Note that P(A) > 0 and P(B) > 0 are not required (as in P(B|A) and P(A|B)).

30
Comments on Independence
If P(A) = 0 or P(B) = 0 then A and B are independent.

Since A ∩ B ⊂ A and A ∩ B ⊂ B =⇒ 0 ≤ P(A ∩ B) ≤ min(P(A), P(B)) = 0, thus

0 = P(A ∩ B) = P(A) · P(B) = 0

If A ∩ B = 0/ , i.e., A and B are mutually exclusive, and P(A) > 0 and P(B) > 0, then
A and B cannot be independent.
The fact that A and B are spatially uncoupled in the Venn diagram does not mean
independence, on the contrary there is strong dependence between A and B

because P(A ∩ B) = 0 < P(A) · P(B)

or, knowing that A occurred, leaves no chance for B to occur (strong impact).
Thus A and Ac are not independent as long as 0 < P(A) < 1.

31
Implied Independence

If A and B are independent so are Ac and B and thus also Ac and Bc.

Proof:

P(B) = P(B ∩ A) + P(B ∩ Ac) = P(B) · P(A) + P(B ∩ Ac)

=⇒ P(B) · (1 − P(A)) = P(B ∩ Ac) =⇒ P(B ∩ Ac) = P(B)P(Ac)

32
Examples of Independence/Dependence
1. Given: P(A) = 0.4, P(B) = 0.5, and P([A ∪ B]c) = 0.3.
Are A and B independent?

P(A ∪ B) = 0.7 = P(A) + P(B) − P(A ∩ B) = 0.4 + 0.5 − P(A ∩ B)

=⇒ P(A ∩ B) = 0.2 = P(A) · P(B) =⇒ A and B are independent!

2. Given: P(A ∩ Bc) = 0.3, P(Ac ∩ B) = 0.2, and P(Ac ∩ Bc) = 0.1.
Are A and B independent?

0.1 = P(Ac ∩ Bc) = P([A ∪ B]c) = 1 − P(A ∪ B) =⇒ P(A ∪ B) = 0.9

0.9 = P(A ∪ B) = P(A ∩ Bc) + P(Ac ∩ B) + P(A ∩ B) = 0.3 + 0.2 + P(A ∩ B)

=⇒ P(A ∩ B) = 0.4 P(A) = 0.7 P(B) = 0.6

and P(A ∩ B) = 0.4 6= P(A) · P(B) = 0.42, i.e., A and B are dependent.

33
Postulated Independence
In practical applications independence is usually based on our understanding of
physical independence, i.e., A relates to one aspect of an experiment while B
relates to another aspect that is physically independent from the former.

In such cases we postulate probability models which reflect this independence.

Example: First flip a penny, then spin it, with apparent physical independence.
The sample space is S = {HH, HT, TH, TT}, with respective probabilities

p1 · p2, p1 · (1 − p2), (1 − p1) · p2, (1 − p1) · (1 − p2)

where P(H on flip) = P({HT} ∪ {HH}) = p1 · p2 + p1 · (1 − p2) = p1

and P(H on spin) = P({TH} ∪ {HH}) = (1 − p1) · p2 + p1 · p2 = p2

and P({H on flip} ∩ {H on spin}) = P({HH}) = p1 p2 = P(H on flip) · P(H on spin)

34
Common Dependence Situations

1. Consider the population of undergraduates at William & Mary, from which a

student is selected at random. Let A be the event that the student is female,
B be the event that the student is heading for elementary education.

Being told P(A) ≈ .6 and P(A|B) ≥ .9 =⇒ A and B are not independent.

2. Select a person at random from a population of registered voters.

Let A be the event that the person belongs to a country club,
B be the event that the person is a Republican. We probably would expect

P(B|A) P(B)
i.e., A and B are not independent.

35
Mutual Independence of a Collection {Aα} of Events
A collection {Aα} of events is said to consist of mutually independent events
if for any finite choice of events Aα1 , . . . , Aαk in {Aα} we have

P(Aα1 ∩ . . . ∩ Aαk ) = P(Aα1 ) · . . . · P(Aαk )

For example, for 3 events A, B, C, we not only require

P(A ∩ B) = P(A) · P(B), P(A ∩C) = P(A) · P(C), P(B ∩C) = P(B) · P(C)

but also P(A ∩ B ∩C) = P(A) · P(B) · P(C) (2)

Pairwise independence does not necessarily imply (2).

Counterexample: Flip 2 fair coins. Let A = {H on 1st flip}, B = {H on 2nd flip},

C = {same result on both flips} with P(A) = P(B) = P(C) = 12 and
P(A ∩C) = P(HH) = 14 , etc., but P(A ∩ B ∩C) = P(HH) = 14 6= 18 .
−→ text example on “independence” of 3 blood markers (O.J. Simpson trial).
36
Random Variables
In many experiments the focus is on numbers assigned to the various outcomes.
Numbers → arithmetic and common arena for understanding experimental results.

The simplest and nontrivial example is illustrated by a coin toss: S = {H, T},
where we assign the number 1 to the outcome {H} and 0 to {T}.
Such an assignment can be viewed as a function X : S → R

H X 1
−→ with X(H) = 1 and X(T) = 0
T 0
Such a function is called a random variable. We use capital letters from the end of
the alphabet to denote such random variables (r.v.’s), e.g., U,V,W, X,Y, Z .

Using the word “variable” to denote a function is somewhat unfortunate,

but it is customary. It emphasizes the varying values that X can take on
as a result of the (random) experiment.

It seems that X only relabels the experimental outcomes, but there is more.
37
Random Variables for Two Coin Tosses

Toss a coin twice.

Assign the number of heads to each outcome in S = {HH, HT, TH, TT}. Y : S → R

HH HT Y 2 1
−→ with Y (HH) = 2, Y (HT) = Y (TH) = 1, and Y (TT) = 0
TH TT 1 0
We may also assign a pair of numbers (X1, X2) to each of the outcomes as follows

X1(HH) = 1, X1(HT) = 1, X1(TH) = 0, X1(TT) = 0

X2(HH) = 1, X2(HT) = 0, X2(TH) = 1, X2(TT) = 0

X1 = # of heads on the first toss and X2 = # of heads on the second toss.

X = (X1, X2) is called a random vector (of length 2).

We can express Y also as Y = X1 + X2 = g(X1, X2) with g(x1, x2) = x1 + x2.

38
Borel Sets

A random variable X induces a probability measure on a sigma field B of certain

subsets of R = (−∞, ∞).

This sigma field, the Borel sets, is the smallest sigma field containing all intervals
(−∞, y] for y ∈ R, i.e., it contains all sets that can be obtained by complementation,
countable unions and intersections of such intervals, e.g., it contains intervals like

[a, b], [a, b), (a, b], (a, b) for any a, b ∈ R (why?)

It takes a lot gyrations to construct a set that is not a Borel set.

We won’t see any in this course.

How do we assign probabilities to such Borel sets B ∈ B ?

Each r.v. X induces its own probability measure on the Borel sets B ∈ B .
39
Induced Events and Probabilities
Suppose we have a r.v. X : S −→ R with corresponding probability space (S, C , P).

For any Borel set B ∈ B we can determine the set X −1(B) of all outcomes in S
which get mapped into B, i.e.,

X −1(B) = {s ∈ S : X(s) ∈ B}
How do we know that X −1(B) ∈ C is an event? We don’t.
Thus we require it in our definition of a random variable.

Definition: A function X : S −→ R is a random variable if and only if the

induced event X −1((−∞, y]) ∈ C for any y ∈ R

and thus the induced probability PX (induced by P and X )

PX ((−∞, y]) = P({s ∈ S : X(s) ≤ y}) exists for all y ∈ R

40
Cumulative Distribution Function (CDF)

A variety of ways of expressing the same probability (relaxed and fastidious):

−1
PX ((−∞, y]) = P X ((−∞, y]) = P ({s ∈ S : X(s) ∈ (−∞, y]})

= P(−∞ < X ≤ y) = P(X ≤ y) (most relaxed)

Definition: The cumulative distribution function (cdf) of a random variable X

is the function F : R −→ [0, 1] defined by

F(y) = P(X ≤ y)

41
CDF for Single Coin Toss or Coin Spin

Example (coin toss P(H) = 0.5):

Since X takes only the values 1 and 0 for H and T we have

 0.0 = P(0)
/ for y < 0
P(X ≤ y) = 0.5 = P(X = 0) = P(T) for 0 ≤ y < 1
1.0 = P(X = 0 ∪ X = 1) = P(T ∪ H) for 1 ≤ y


Example (coin spin P(H) = 0.3):


 0.0 = P(0)
/ for y < 0
P(X ≤ y) = 0.7 = P(X = 0) = P(T) for 0 ≤ y < 1
1.0 = P(X = 0 ∪ X = 1) = P(T ∪ H) for 1 ≤ y


The jump sizes at 0 and 1 represent 1 − P(H) = P(T) and P(H), respectively.
See CDF plots on next slide.

42
F(y) = P(X ≤ y) F(y) = P(X ≤ y)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

−2
−2

−1
−1

●
●
●
●

0
0

y
y

●
●
●
●

1
1

2
2

3
3
CDF for Coin Toss/Coin Spin

43
2 Fair Coin Tosses

For two fair coin tosses the number X of heads takes the values 0,1,2

for s = TT, s = HT or s = TH, and s = HH with probabilities 14 , 21 , 41 , respectively.



 0.0 = P(0)
/ for y < 0

 0.25 = P(X = 0) = P(TT) for 0 ≤ y < 1

P(X ≤ y) =


 0.75 = P(X = 0 ∪ X = 1) = P(TT ∪ HT ∪ TH) for 1 ≤ y < 2

1.0 = P(X = 0 ∪ X = 1 ∪ X = 2) = P(S) for 2 ≤ y


See CDF plot on next slide.

44
CDF for 2 Fair Coin Tosses
1.0

●
0.8

● ●
F(y) = P(X ≤ y)

0.6
0.4

● ●
0.2
0.0

−2 −1 0 1 2 3

45
General CDF Properties
1. 0 ≤ F(y) ≤ 1 for all y ∈ R (F(y) is a probability)

2. y1 ≤ y2 =⇒ F(y1) ≤ F(y2) (monotonicity property)

This follows since {X ≤ y1} ⊂ {X ≤ y2} =⇒ P(X ≤ y1) ≤ P(X ≤ y2).

3. Limiting behavior as we approach ±∞

lim F(y) = 0 and lim F(y) = 1

y→−∞ y→∞
This follows (with some more attention to technical detail) since
\ [
lim {X ≤ y} = {X ≤ y} = 0/ and lim {X ≤ y} = {X ≤ y} = S
y→−∞ y→∞
y→−∞ y→∞
Note that in our examples we had F(y) = 0 for sufficiently low y (y < 0) and
F(y) = 1 for sufficiently high y. X had a finite and thus bounded value set.

46
Two Independent Random Variables
Two random variable X1 and X2 are independent if any events defined in terms of
X1 is independent of any event defined in terms of X2.

The following weaker but more practical definition is equivalent.

Definition: Let X1 : S −→ R and X2 : S −→ R be random variables defined on the

same sample space S. X1 and X2 are independent if and only if for each y1 ∈ R
and y2 ∈ R

P(X1 ≤ y1, X2 ≤ y2) = P(X1 ≤ y1) · P(X2 ≤ y2)

Note the shorthand notation

P(X1 ≤ y1, X2 ≤ y2) = P({X1 ≤ y1} ∩ {X2 ≤ y2})

You also often see P(AB) for P(A ∩ B).

47
Independent Random Variables

A collection of random variable {Xα} is mutually independent if the above product

property holds for any finite subsets of these random variables, i.e., for any integer
k ≥ 2 and finite index subset α1, . . . , αk we have for all y1, . . . , yk ∈ R

P(Xα1 ≤ y1, . . . , Xαk ≤ yk ) = P(Xα1 ≤ y1) · . . . · P(Xαk ≤ yk )

Whether the independence assumption is appropriate in a given application

is mainly a matter of judgment or common sense.

With independence we have access to many powerful and useful theorems

in probability and statistics.

The Secret Language of Relationships
91% (22)
The Secret Language of Relationships
840 pages
Contemporary Social Issues of Tamil Nadu
No ratings yet
Contemporary Social Issues of Tamil Nadu
8 pages
The Book of E by Laurel Airica
100% (1)
The Book of E by Laurel Airica
32 pages
Lecture 2: Basics of Probability Theory: 1 Axiomatic Foundations
No ratings yet
Lecture 2: Basics of Probability Theory: 1 Axiomatic Foundations
7 pages
Electrical Installation Theory and Practice by e L Donnelly
100% (1)
Electrical Installation Theory and Practice by e L Donnelly
1 page
Chapter 5: Introduction To Probability: 5.1. Basic Concepts Definition of Terms
No ratings yet
Chapter 5: Introduction To Probability: 5.1. Basic Concepts Definition of Terms
10 pages
Sample Space and Events
No ratings yet
Sample Space and Events
4 pages
Probability Chapter 1 Part 1 (Kumbhojkar)
No ratings yet
Probability Chapter 1 Part 1 (Kumbhojkar)
30 pages
Probability Theory Tutorial Guide
No ratings yet
Probability Theory Tutorial Guide
25 pages
List of NBFC-Micro Finance Institutions (NBFC-MFIs)
0% (1)
List of NBFC-Micro Finance Institutions (NBFC-MFIs)
7 pages
Probability1 PDF
No ratings yet
Probability1 PDF
45 pages
Definitions: Definition 1.1.1
No ratings yet
Definitions: Definition 1.1.1
28 pages
Chapter 02 - Axioms of Probability
No ratings yet
Chapter 02 - Axioms of Probability
28 pages
Lect 1
No ratings yet
Lect 1
33 pages
MA-2203: Introduction To Probability and Statistics: Lectures Notes
No ratings yet
MA-2203: Introduction To Probability and Statistics: Lectures Notes
27 pages
A Thre Ya Sarkar Tanner
No ratings yet
A Thre Ya Sarkar Tanner
252 pages
Section 2.2 Sample Space and Events
No ratings yet
Section 2.2 Sample Space and Events
8 pages
Probability Notes
No ratings yet
Probability Notes
73 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
Probability & Stochastic Processes
No ratings yet
Probability & Stochastic Processes
55 pages
Elements of Probability and Statistical Theory: STAT 160A
No ratings yet
Elements of Probability and Statistical Theory: STAT 160A
96 pages
Probability
No ratings yet
Probability
43 pages
Handout 2-Axiomatic Probability
No ratings yet
Handout 2-Axiomatic Probability
17 pages
Probability Basics for CS Students
No ratings yet
Probability Basics for CS Students
43 pages
Lectures Chapter 2A
No ratings yet
Lectures Chapter 2A
9 pages
Lecture 1
No ratings yet
Lecture 1
87 pages
Math 170A
No ratings yet
Math 170A
34 pages
3 Probability
No ratings yet
3 Probability
44 pages
Prob Stats
No ratings yet
Prob Stats
80 pages
Music Theory and Technique
100% (1)
Music Theory and Technique
8 pages
Probability Theory Class Notes
No ratings yet
Probability Theory Class Notes
17 pages
Notes
No ratings yet
Notes
84 pages
Chapter 2 - Axioms of Probability 1
No ratings yet
Chapter 2 - Axioms of Probability 1
17 pages
Grade 11 Detailed Lesson Plan 11 Michael R. de Leon Organization and Management February 11 - 15, 2019 Fourth Session III
No ratings yet
Grade 11 Detailed Lesson Plan 11 Michael R. de Leon Organization and Management February 11 - 15, 2019 Fourth Session III
2 pages
2 Axioms
No ratings yet
2 Axioms
10 pages
Prob ch1
No ratings yet
Prob ch1
34 pages
MC216 2
No ratings yet
MC216 2
65 pages
EE311 Lecture Chapter#03 Elements of Probability
No ratings yet
EE311 Lecture Chapter#03 Elements of Probability
37 pages
01-Basic Probability Theory
No ratings yet
01-Basic Probability Theory
41 pages
Basic Probability
No ratings yet
Basic Probability
46 pages
FIBA Referee Training Guide
No ratings yet
FIBA Referee Training Guide
156 pages
Probability Addition Theorem
No ratings yet
Probability Addition Theorem
6 pages
Sentence Opening Sheet for Revision
No ratings yet
Sentence Opening Sheet for Revision
4 pages
Notes 2
No ratings yet
Notes 2
3 pages
01 Basic Probability Theory SEIDTCHR
No ratings yet
01 Basic Probability Theory SEIDTCHR
41 pages
Probability
No ratings yet
Probability
45 pages
Approaches To Acting.1
No ratings yet
Approaches To Acting.1
8 pages
Adobe Scan 04-Jan-2023
No ratings yet
Adobe Scan 04-Jan-2023
16 pages
English 7 Learning Plan - 1ST QUARTER
0% (1)
English 7 Learning Plan - 1ST QUARTER
13 pages
Probability - Lecture 2-3
No ratings yet
Probability - Lecture 2-3
6 pages
Probabilites ch1
No ratings yet
Probabilites ch1
109 pages
Lect 1 Sol
No ratings yet
Lect 1 Sol
29 pages
Thermodynamics Exam Prep
No ratings yet
Thermodynamics Exam Prep
8 pages
01 Probability Spaces
No ratings yet
01 Probability Spaces
33 pages
Session 1-2 - Probability
No ratings yet
Session 1-2 - Probability
34 pages
Underground Clinical Vignettes - Biochemistry
100% (2)
Underground Clinical Vignettes - Biochemistry
114 pages
Arnabc74 Github Io Prob1 2025 Basic HTML
No ratings yet
Arnabc74 Github Io Prob1 2025 Basic HTML
28 pages
Vlsi
100% (2)
Vlsi
98 pages
Introduction To Propabilty Theory
No ratings yet
Introduction To Propabilty Theory
19 pages
Chap2 Full
No ratings yet
Chap2 Full
18 pages
New Results Bolster Penrose's Quantum Consciousness Hypothesis
No ratings yet
New Results Bolster Penrose's Quantum Consciousness Hypothesis
7 pages
Sample Epms (Map Lna - PPCR With Epms) - Updated100917
No ratings yet
Sample Epms (Map Lna - PPCR With Epms) - Updated100917
5 pages
01-Basic Probability Theory
No ratings yet
01-Basic Probability Theory
49 pages
01 ProbabilityModelsFilled
No ratings yet
01 ProbabilityModelsFilled
20 pages
Iman Aur Kufr English Faith and Disbelief
No ratings yet
Iman Aur Kufr English Faith and Disbelief
77 pages
Probability and Statistics: Dr. Jagannath Bhanja
No ratings yet
Probability and Statistics: Dr. Jagannath Bhanja
33 pages
Sample - Space and Prob
No ratings yet
Sample - Space and Prob
8 pages
Random Variale & Random Process
No ratings yet
Random Variale & Random Process
298 pages
QWR Template - Generic
100% (2)
QWR Template - Generic
3 pages
Ece03 Prelim
No ratings yet
Ece03 Prelim
8 pages
Entropy Bounds in Random Variables
No ratings yet
Entropy Bounds in Random Variables
4 pages
最新英语电影评论
100% (2)
最新英语电影评论
7 pages
(Ebook) Psychoanalysis and Transversality: Texts and Interviews 1955 - 1971 by Félix Guattari ISBN 9781584351276, 1584351276 Download
No ratings yet
(Ebook) Psychoanalysis and Transversality: Texts and Interviews 1955 - 1971 by Félix Guattari ISBN 9781584351276, 1584351276 Download
151 pages
02-Wave Motion (Practice Problem)
100% (1)
02-Wave Motion (Practice Problem)
16 pages
2000 VISEM Activity Chain Based Modeling
No ratings yet
2000 VISEM Activity Chain Based Modeling
19 pages
Introduction to Probability Concepts
100% (1)
Introduction to Probability Concepts
51 pages
Ways Students Can Spend Their Leisure Time
No ratings yet
Ways Students Can Spend Their Leisure Time
2 pages
Basics of Probability
No ratings yet
Basics of Probability
6 pages
Revolver - April 2017
No ratings yet
Revolver - April 2017
8 pages
Appendix A 2
No ratings yet
Appendix A 2
7 pages
Prob Imp THR
No ratings yet
Prob Imp THR
7 pages
Correspondence: Iterative Water-Filling For Gaussian Vector Multiple-Access Channels
No ratings yet
Correspondence: Iterative Water-Filling For Gaussian Vector Multiple-Access Channels
8 pages
Piazzolla
No ratings yet
Piazzolla
3 pages
Lesson Plan Letter H
No ratings yet
Lesson Plan Letter H
5 pages
Lagman V Zosa
No ratings yet
Lagman V Zosa
2 pages
Aditya Gupta's Engineering Resume
No ratings yet
Aditya Gupta's Engineering Resume
2 pages
Balance Sheet: Mar ' 14 Mar ' 13 Mar ' 12 Mar ' 11 Mar ' 10
No ratings yet
Balance Sheet: Mar ' 14 Mar ' 13 Mar ' 12 Mar ' 11 Mar ' 10
3 pages
Gashadokuro
No ratings yet
Gashadokuro
1 page
Chapter 1: Probability Theory (Cont'd) : Section 1.7: Counting Techniques
No ratings yet
Chapter 1: Probability Theory (Cont'd) : Section 1.7: Counting Techniques
12 pages
Assignment #0 (Review of Probability, Random Variables, Random Processes, Digital Communications, Linear Algebra)
No ratings yet
Assignment #0 (Review of Probability, Random Variables, Random Processes, Digital Communications, Linear Algebra)
2 pages
Nswer Sheet (Practice Problems) : Simple Harmonic Motion
No ratings yet
Nswer Sheet (Practice Problems) : Simple Harmonic Motion
1 page
Meet The Teacher Bio
No ratings yet
Meet The Teacher Bio
1 page