Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views96 pages

Probability Theory Lecture Note

This document introduces the concept of experiments, differentiating between physical, chemical, and social experiments, and discusses the necessity of mathematical models, which can be deterministic or non-deterministic. It explains the characteristics of these models, the definition and operations of sets, and the concepts of random experiments, outcomes, and events. Additionally, it covers finite sample spaces, equally likely outcomes, and counting techniques such as the multiplication and addition rules.

Uploaded by

abejeagmas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views96 pages

Probability Theory Lecture Note

This document introduces the concept of experiments, differentiating between physical, chemical, and social experiments, and discusses the necessity of mathematical models, which can be deterministic or non-deterministic. It explains the characteristics of these models, the definition and operations of sets, and the concepts of random experiments, outcomes, and events. Additionally, it covers finite sample spaces, equally likely outcomes, and counting techniques such as the multiplication and addition rules.

Uploaded by

abejeagmas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

CHAPTER ONE

1. Introduction
Experiment: - Is any process of observation or measurement or any process, which generates well-defined
outcome.

Experiment can be:

- Physical experiment

- Chemical experiment

- Social experiment

In order to perform an experiment we need models. These models are mathematical models, which can be
either deterministic or non-deterministic models

1.1 Deterministic and non-deterministic models


A model is an approximate representation of a physical situation. Mathematical models are used when the
observational phenomenon has measurable properties. A mathematical model consists of a set of assumptions
about how a system or physical process works. These assumptions are stated in the form of mathematical
relations involving the important parameters and variables of the system. The conditions under which an
experiment involving the system is carried out determine the “givens” in the mathematical relations, and the
solution of these relations allows us to predict the measurements that would be obtained if the experiment
were performed.

There are two basic types of mathematical models, deterministic and non-deterministic (probability) models.

1.1.1 Deterministic models


A model which stipulates that the conditions under which an experiment is performed to determine the
outcome of experiment.

In deterministic models, the conditions under which an experiment is carried out determine the exact outcome
of the experiment. In deterministic mathematical models, the solution of a set of mathematical equations
specifies the exact outcome of the experiment.

V
Example1: Ohm’s law states that the voltage-current characteristic of a resistor is I  . The voltages and
R
currents in any circuit consisting of an interconnection of batteries and resistors can be found by solving a
system of simultaneous linear equations that is found by applying Kirchhoff’s laws and Ohm’s law.
1
If an experiment involving the measurement of a set of voltages is repeated a number of times under the same
conditions, circuit theory predicts that the observations will always be the same. In practice, there will be
some variation in the observations due to measurement errors and uncontrolled factors. Nevertheless, this
deterministic model will be adequate as long as the deviation about the predicted values remains small.

Example2: gravitational law (F = mg)

1.1.2 Non-deterministic (probabilistic or stochastic ) models


A model in which the outcome of the experiment cannot be determined beforehand hence many possible
outcomes exists.

Many systems of interest involve phenomena that exhibit unpredictable variation and randomness. We define
a random experiment to be an experiment in which the outcome varies in an unpredictable fashion when the
experiment is repeated under the same conditions. Deterministic models are not appropriate for random
experiments since they predict the same outcome for each repetition of an experiment. In this section, we
introduce probability models that are intended for random experiments.

Example: - Toss a die and observe the number that shows on top.

- Toss a coin four times and observe the total number of heads obtained.

- From urn containing red and black balls, a ball is chosen and its color noted.

1.2 Introduction to Sets

Set: - It is a collection of well-defined objects from the specified universe.


-A set is a collection of unique objects.

Example: A = {a, o, f, g, h}

Sets are denoted by capital letters and small letters denote elements.

Note: If x belongs to set A, we write x∊A. If x is not belongs to A, we x∉A

Two special sets


1. Universal set ( ): Is a set of all objects under consideration.

2. Empty set (Ø or {}): Is a set without elements.

2
1.2.1 Set operations
1. If A⊆B, i.e. x∊A⇒ x∊B, ∀x.

2. Ø⊆A for any set A.

3. A⊆A, for any set A.

4. If A has n elements, then there are 2n subsets

5. If A=B⟺A⊆B and B⊆A.



6. Ac =A’= A ={x∊U & x∉A}

7. Set union : AB={ x ∊A or x ∊ B or x ∊ AB}

8. Set intersection: AB{ x ∊A and x ∊B}

9. In general , if we have a sequence of sets A1, A2, A3,……………,An, then


n
i. A1A2A3……………An= Ai
i 1

n
ii. A1A2A3……………An= Ai
i 1

Set properties
1. AB = BA, AB = BA

2. A(BC) = (AB)C, A(BC) = (AB)C

3. A(BC) = (AB)(AC), A(BC) = (AB)(AC)

4. If A and B are disjoint sets, then AB = Ø

5. DeMorgan’s Laws:
__________ ____ ____

AB  A B
__________ ____ ____
i. AB  A B ii.

1.3 Sample Spaces and events


Random Experiment: It is an experiment that can be repeated any number of times under similar conditions
and it is possible to enumerate the total number of outcomes without predicting an individual out come.
- A random experiment is one which can be repeated under the same conditions but whose outcome cannot
be predicted with certainty.
3
Outcomes: The results of a random experiment
Event: It is a subset of sample space. It is a statement about one or more outcomes of a random experiment
.They are denoted by capital letters.
A set S that consists of all possible outcomes of a random experiment is called a sample space, and each
outcome is called a sample point.

Simple Event: If an event has one element of the sample space then it is called a simple or elementary event.
Let S= {1,2,3,4,5,6}]

If the event is the set of elements less than 2, then E = {1} is a simple event

Compound Event: If an event has more than one sample points, the event is called a compound event. In the
above example of throwing a die, {1, 4} is a compound event.

Example: If a fair die is rolled once, it is possible to list all the possible outcomes i.e.1, 2, 3, 4, 5, 6, but it is
not possible to predict which outcome will occur. Let A be the event of odd numbers, B be the event of even
numbers, and C be the event of number 8.

 A  1,3,5
B  2,4,6
C    or empty space or impossible event

Features of random experiment


- Each experiment is capable of being repeated indefinitely under essentially unchanged conditions.

- Although we are in general not able to state what a particular outcome will be, we are able to describe
the set of all possible outcomes of the experiment.

- As the experiment is repeated a large number of times the definite pattern or regularity appears. It is this
regularity, which makes it possible to construct a precise mathematical model with which to analyze the
experiment.

Combination of sets (events)

a. If A and B are events, AB is the event which occurs if and only if A or B or both occur.

b. If A and B are events, AB is the event which occurs if and only if A and B occur.

4

c. If A is an event, A is the event, which occurs if and only if A does not occur.

n
d. If A1, A2, A3,……………,An is any finite collection of events, then Ai is the event which occurs
i 1

if and only if at least one of the events Ai occur.

n
e. If A1, A2, A3,……………,An is any finite collection of events, then Ai is the event which occurs
i 1

if and only if all the events Ai occur.

n
f. A1, A2, A3,……………,An…… is any(countable) infinite collection of events, then Ai is the
i 1

event which occurs if and only if at least one of the events Ai occurs.

__________  
'
g. A B = (A  B) is the event which occurs if and only if A and B does not occur.

Example: if we toss a coin twice, identify

a. Experiment

b. Outcomes

c. Sample space

Solution:

a. Tossing a coin

b. HH, HT , TH, TT

c. S = { HH, HT, TH, TT}

Definition: Two events A and B are said to be mutually exclusive (disjoint) if they cannot occur together.
We express this by writing AB = Ø; that is the intersection of A and B is the empty set.

Two events A and B are said to be independent if and only if the occurrence of one does not effect on the
occurrence or non-occurrence of the other event.

5
1.4 Finite Sample Spaces

A finite sample space is a sample space, which consists of a finite number of elements.

I.e. S={x1, x2,…………….., xn}, xi is possible element of sample space

The sample space for the experiment of a toss of a coin is a finite sample space. It has only two sample points.
But the sample space for the experiment where the coin is tossed until a heads shows up is not a finite sample
space -- it is theoretically possible that you could keep tossing the coin indefinitely.

Tossing a coin. The experiment is tossing a coin (or any other object with two distinct sides.) The coin may
land and stay on the edge, but this event is so enormously unlikely as to be considered impossible and be
disregarded. So the coin lands on either one or the other of its two sides. One is usually called head, the
other tail. These are two possible outcomes of a toss of a coin. In the case of a single toss, the sample space
has two elements that interchangeably, may be denoted as, say, {Head, Tail}, or {H, T}, or {0, 1}, ...

Rolling a die. The experiment is rolling a die. A common die is a small cube whose faces shows numbers 1,
2, 3, 4, 5, 6 one way or another. There are six possible outcomes and the sample space consists of six elements
:{ 1, 2, 3, 4, 5, 6}.

1.5 Equally likely outcomes


The notion of an equally likely outcome is that any outcome in S has an equal chance of occurring and it
follows that:
1
pi  , i  p1  p2  .........  pn
n
1 1 1 1
=    ................. 
n n n n
n

1 1* n
= i 1
 1
n n
In this case, we define the probability P (A) of the event A occurring to be:
n( A)
P( A) 
n( S )
Example: A fair coin tossed three times. Let A be event that only one tail appears. Find P (A).
Solution: S = {HHH, HHT, HTT, TTH, THH, THT, HTH, TTT}
1 1 1 3
P(A) = P(HHT) + P(THH) + P(HTH) =   
8 8 8 8
6
Example2: Four equally qualified applicants (a, b, c, d) are up for two positions. Applicant a is a minority.
Positions are chosen at random. What is the probability that the minority is hired? Here, the sample space is
S = {ab, ac, ad, bc, bd, cd}.
We are assuming that the order of the positions is not important. If the positions are assigned at random, each
1
of the six sample points is equally likely and has probability. Let E denote the event that a minority is
6
number of outcomes in E 3
hired. Then, E = {ab, ac, ad} and P( E )  
6 6
1.6 Counting techniques
In order to calculate probabilities, we have to know

 The number of elements of an event


 The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.

 In order to determine the number of outcomes, one can use several rules of counting
- Addition rule.
- The multiplication rule
- Permutation rule
- Combination rule
1.6.1 The Multiplication Rule
If a choice consists of k stages of which the first can be made in n1 ways, the second can be made in n2 ways…

the kth can be made in nk ways, then the whole choice can be made in (n1 * n2 * ........ * nk ) ways.

Example 1: An experiment consists of rolling two dice. Envision stage 1 as rolling the first and stage 2 as
rolling the second. Here, n1 = 6 and n2 = 6. By the multiplication rule, there are n1* n2 = 6*6 = 36 different
outcomes.

Example 2: An airline has 6 flights from A to B, and 7 flights from B to C per day. If the flights are to be
made on separate days, in how many different ways can the airline offer from A to C?

Solution: In operation 1 there are 6 flights from A to B, 7 flights are available to make flight from B to C.
Altogether there are 6*7 = 42 possible flights from A to C.

7
1.6.2 The addition rule
Suppose that the 1st procedure designed by 1 can be performed in n1 ways. Assume that 2nd procedure
designed by 2 can be performed in n2 ways. suppose further more that, it is not possible that both procedures 1
and 2 are performed together then the number of ways in which we can perform 1or 2 procedure is n 1+n2
ways, and also if we have another procedure that is designed by k with possible way of n k we can conclude
that there is n1+n2+…+nk possible ways.

Example: suppose we planning a trip, are deciding by bus, and train transportation. If there are 3 bus routes
and 2 train routes to go from A to B. find the available routes for the trip.

Solution:
There are 3+2 =5 routes for someone to go from A to B.
1.6.3 Permutation
A permutation is an arrangement of distinct objects in a particular order. Order is important

Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!

Where n! n * (n  1) * (n  2) * ..... * 3 * 2 *1

n! n!
nPn =  =n!. In definition 0! = 1! = 1
n  n! 0!
2. The arrangement of n objects in a specified order using r objects at a time is called the permutation of n

objects taken r objects at a time. It is written as n Pr and the formula is

n!
n Pr 
(n  r )!
3. The number of permutations of n objects in which k1 are similar objects, k2 are similar objects ---- etc is
n!
n Pr 
k1!*k 2 * ... * k n
4. Circular Permutation

The number of ways to arrange n distinct objects along a fixed (i.e., cannot be picked up out of the plane
and turned over) circle is pn  (n  1)!

8
The number is (n-1)! instead of the usual factorial n! since all cyclic permutations of objects are
equivalent because the circle can be rotated.

For example, of the permutations of three objects, the distinct circular permutations
are and . Similarly, of the permutations of four objects, the distinct circular
permutations are , , , , .

Example 1: My bookshelf has 10 books on it. How many ways can I permute the 10 books on the shelf?
Answer: 10! = 3, 628,800.

Example 2: Suppose we have a letters A, B, C, D.


How many permutations are there two letters at a time?

Here n  4, r  2
4! 24
 There are 4 P2    12 permutations.
(4  2)! 2
Example 3: In how many different ways can 3 red, 4 yellow and 2 blue bulbs can be arranged in a string
of Christmas tree light with 9 sockets?

Solution: n1= 3 red n2 = 4 yellow n3 = 2 blue

3
9! 9*8*7*6*5*4!
n
i 1
i  3 4 2  9 
3!4!2!

3*2*4!*2
 1260 ways

9
1.6.4 Combination

A selection of objects without regard to order is called combination.

Example: Given the letters A, B, C, and D list the permutation and combination for selecting two letters.

Solutions:
Permutation Combination

AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC

Combination Rule

 n  and is
The number of combinations of r objects selected from n objects is denoted by
n Cr or 
r 
 
given by the formula:

n n!
  
 r  (n  r )!*r!
Example 1: In how many ways a committee of 5 people be chosen out of 9 people?

Solutions:

n9 , r 5
n n! 9!

r
  ( n  r )!*r!  4!*5!  126 ways
 

1. Among 15 clocks there are two defectives .In how many ways can an inspector chose three of the clocks
for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions:

n  15 of which 2 are defective and 13 are non  defective .


r 3

10
a) If there is no restriction select three clocks from 15 clocks and this can be done in :
n  15 , r  3
n n! 15!
 
    455 ways
  (n  r )!*r! 12!*3!
r

b) None of the defective clocks is included.


This is equivalent to zero defective and three non defective, which can be done in:

 2  13 
  *    286 ways.
 0  3 
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non-defective, which can be done in:

 2  13 

1 *
 
  156 ways.
   2 

d) Two of the defective clock is included.


This is equivalent to two defective and one non-defective, which can be done in:

 2  13 
  *    13 ways.
 2  1 

Note: the number of ways of choosing r things out of n without replacement is given by n and the
 
r
r
number of ways of choosing r things out of n with replacement is given by n .

1.7 Basic notations of Probability


Definitions: Probability is a measure of one's belief in the occurrence of a future (random) event.

Probability is also known as “the mathematics of uncertainty.”

The probability of an event is a measure (number) of the chance with which we can expect the event to occur.
We assign a number between 0 and 1 inclusive to the probability of an event.

A probability of 1 means that we are 100% sure of the occurrence of an event, and a probability of 0 means
that we are 100% sure of the nonoccurrence of the event. P (A) denotes the probability of any event A in the
sample space S.

11
CLASSICAL DEFINITION OF PROBABILITY: If there are n equally likely possibilities, of which one
must occur, and m of these are regarded as favorable to an event, or as “success,” then the probability of the
m
event or a “success” is given by .
n

FREQUENCY DEFINITION OF PROBABILITY: The probability of an outcome (event) is the proportion


of times the outcome (event) would occur in a long run of repeated experiments.

In any random experiment there is always uncertainty as to whether a particular event will or will not occur. As
a measure of the chance, or probability, with which we can expect the event to occur, it is convenient to assign
a number between 0 and 1. If we are sure or certain that an event will occur, we say that its probability is 100%
or 1. If we are sure that the event will not occur, we say that its probability is zero. If, for example, the
probability is 1 , we would say that there is a 25% chance it will occur and a 75% chance that it
4

will not occur. Equivalently, we can say that the odds against occurrence are 75% to 25%, or 3 to 1.

We can estimate the probability of an event by means of three important procedures.

1. CLASSICAL APPROACH: If an event can occur in h different ways out of a total of n possible ways,
all of which are equally likely, then the probability of the event is h/n.

Examples:
1. A fair die is tossed once. What is the
probability of getting
a) Number 4?
b) An odd number?
c) An even number?
d) Number 8?
Solutions:
First identify the sample space, say S
S  1, 2, 3, 4, 5, 6
 N  n( S )  6

12
a) Let A be the event of number 4
A  4
 N A  n( A)  1
n( A)
P ( A)  1 6
n( S )

b) Let A be the event of odd numbers


A  1,3,5
 N A  n( A)  3
n( A)
P ( A)   3 6  0.5
n( S )

c) Let A be the event of even numbers


A  2,4,6
 N A  n( A)  3
n( A)
P ( A)   3 6  0.5
n( S )

d) Let A be the event of number 8


A Ø
 N A  n( A)  0
n( A)
P( A)  0 60
n( S )

Shortcoming of the classical approach:


This approach is not applicable when:

- The total number of outcomes is infinite.


- Outcomes are not equally likely.
2. FREQUENCY APPROACH: This is based on the relative frequencies of outcomes
belonging to an event.

Definition: The probability of an event A is the proportion of outcomes favorable to A in the


long run when the experiment is repeated under same condition.

NA
P( A)  lim
N  N

Example: If records show that 60 out of 100,000 bulbs produced are defective. What is the
probability of a newly produced bulb to be defective?

13
Solution:
Let A be the event that the newly produced bulb is defective.
NA 60
P( A)  lim   0.0006
N  N 100,000

3. The Axioms of Probability: Suppose we have a sample space S. If S is discrete, all subsets
correspond to events and conversely; if S is non-discrete, only special subsets (called measurable)
correspond to events. To each event A in the class C of events, we associate a real number P (A).
The P is called a probability function, and P (A) the probability of the event, if the following
axioms are satisfied.

1. P ( A)  0
2. P ( S )  1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P ( A  B )  P ( A)  P ( B )
c
4. P ( A )  1  P ( A)
5. 0  P ( A)  1
6. P (ø) =0, ø is the impossible event.
7. If A1, A2, A3,……………,An , are pair wise mutually exclusive events ,then
 n  n
p  Ai   P(A1 )  p(A 2 )  p(A n )   p( Ai )
 i 1  i 1

Remark: Venn-diagrams can be used to solve probability problems.

AUB AnB A

In general P ( A  B )  p ( A)  p ( B )  p ( A  B )

Some basic properties of probability


1. For two events A and B in S, we have the following:
2. P (Ac) = 1 − P (A), where Ac is the complement of the set A in S.
3. If A ⊂ B, then P (A) ≤ P (B).
4. P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
5. In particular, if A ∩ B = Ø, then P (A ∪ B) = P (A) + P (B).

14
1.8 Derived theorems of probability

Theorem 1: If  is impossible event, then P (  ) =0

Proof S∪  = S and S∩  = 

⟹P (S∪  ) = P(S) +P (  )

⟹P(S) = P(S) + P (  )

⟹ P(S) - P(S) = P (  )

⟹ P (  ) =0

Theorem 2: P( Ac ) =1- P( A)

Proof A∩ Ac =  , because A and Ac are mutually exclusive events.

⟹ A∪ A = S ⟹P (A∪ A ) = P (A) + P( A )
c c c

⟹P(S) = P (A) + P( A ) =1
c

⟹ P (A) + P( A ) =1
c

⟹ P( A ) = 1- P (A)
c

c
Theorem 3: For any two events A and B , P ( A  B )  P ( A)  P ( A  B )
c
A can be decomposed into the mutually exclusive events A  B and A  B :

c c c
A  ( A  B )  ( A  B )  P ( A  B )  P ( A  B )  P ( A  B )  P ( A)  P ( A  B )

Theorem 4: For any events A and B , P( A  B)  P( A)  P( B)  P( A  B)


Proof:

A  B can be decomposed into the mutually exclusive events A  B and B  A  B  ( A  B )  B


c c

c
 P ( A  B )  P ( A  B )  P ( B )  P ( A)  P ( A  B )  P ( B )

 P ( A)  P ( B )  P ( A  B )

15
Theorem 5: For any three events A, B and C, P (A∪B∪C) = P (A) +P (B) +P(C)-P(A∩B)-
P(A∩C)-P(B∩C)+P(A∩B∩C)

Proof A∪B∪C = (A∪B) ∪C

 P (A∪B∪C) = P{(A∪B) ∪ C}= P (A∪B)+P(C)- P{(A∪B) ∩ C}

 P(A∪B)+P(C)-P{(A∩ C) ∪ B∩ C}

 P(A)+P(B)+P(C)-P(A∩B)-{P(A∩ C)+P(B∩ C)-P(A∩B∩C)}

 P(A)+P(B)+P(C)-P(A∩B)-P(A∩ C)-P(B∩ C)+P(A∩B∩C)

An obvious extension to the above theorem suggests. Let A1 , A2 ,........., AK be any k events. Then

K K K k 1
P ( A1  A2  .........  A K )   P (Ai )   P (Ai  A j )   P (Ai  A j  Ar )  ( 1) P ( A1  A2  .....  A K )
i 1 i j2 i jr3

Theorem 6: If AB, then P (A) ≤ P (B)


Proof using Venn diagram
S
B
A Ac ∩B

B = A∪ ( A ∩B) ⟹P (B) = P (A∪ ( A ∩B)) = P (A) + P ( A ∩B)


c c c

 P(B)-P(A) = P ( A ∩B) ≥ 0
c

 P(B)-P(A) ≥ 0 P(B) ≥P(A)

Theorem 7: For any two events A and B, P (A∪B) ≤ P (A) + P (B)


Proof P (A∪B) = P (A) +P (B) - P (A∩B) but 0≤ P (A∩B) ≤1

If P (A∩B) = 0, P (A∪B) = P (A) +P (B). If P (A∩B) ≥0, P (A∪B) < P (A) +P (B).

Implies P (A∪B) ≤ P (A) + P (B)

Theorem 8: If A1, A2,….., An are any n events, then P  Ai    P( Ai )


n n

 i 1  i 1

Proof

16
Example 1

Three items are selected at random from a manufacturing process. Each item is inspected and
classified defective (D) or non-defective (N).

DDD DDN DND NDD


Its sample space is =  
 DNN NDN NND NNN 

Example 2

The event that the number of defectives in above example is greater than 1.

Its sample space is = {DDD DDN DND NDD}

The probability of the event is 4/8 or ½.

Example 3

Suppose a licence plate containing two letters following by three digits with the first digit not
zero. How many different licence plates can be printed?

1st 2nd 1st 2nd 3rd

Letter Letter Digit Digit Digit

Number of A-Z A-Z 1-9 0-9 0-9

Choices (26) (26) (9) (10) (10)

Number of different licence plates that can be printed is

(26)(26)(9)(10)(10) = 608,400

Example 4

How many 7-letter words can be formed using the letters of the word 'BENZENE'?

(There are 1 B, 3 E, 2 N and 1 Z)

7!
The number of 7-letter words that can be formed is  420
(1!)(3!)(2!)(1!)

17
Example 5

A box contains 8 eggs, 3 of which are rotten. Three eggs are picked at random. Find the
probabilities of the following events.

(a) Exactly two eggs are rotten.

(b) All eggs are rotten.

(c) No egg is rotten.

Solution:

(a) The 8 eggs can be divided into 2 groups, namely, 3 rotten eggs as the first group and 5
good eggs as the second group.

Getting 2 rotten eggs in 3 randomly selected eggs can occurred if we select randomly 2
eggs from the first group and 1 egg from the second group.

The number of this outcome is  3 C2  5 C1   15


Total number of possible outcomes of selecting 3 eggs randomly from the total 8 eggs is
8 C3  56 .

Thus the probability of having exactly two rotten among the 3 randomly selected eggs is
 3 C 2  5 C1  15

8 C3 56

(b) Similarly, the probability of having all 3 rotten eggs is

 3 C3  5 C0  1

8 C3 56

(c) The probability of having no rotten egg is

 3 C0  5 C3  10 5
 
8 C3 56 28

Example 6: what is the probability that 3 men and 4 women be selected from a group of 7 men and 10
women?

The answer is = 7350/19448 = 0.3779 (approx)

18
Example 7

180 students took examinations in English and Mathematics. Their results were as follows:

Number of students passing English = 80

Number of students passing Mathematics = 120

Number of students passing at least one subject = 144

Then we can rewrite the above results as:

80 4
Probability that a randomly selected student passed English = 
180 9

120 2
Probability that a randomly selected student passed Mathematics  
180 3

144 4
Probability that a randomly selected student passed at least one subject  
180 5

Find the probability that a randomly selected student passed both subject.

Solution
Let E be the event of passing English, and M be the event of passing Mathematics.

4 2 4
It is given that: P( E )  ; P( M )  ; P( M  E ) 
9 3 5
As P(M  E)  P( E)  P(M )  P(M  E)

4 2 4 14
 P( M  E )  P( E )  P( M )  P( M  E )      0.31
9 3 5 45

Example 8
A card is drawn from a complete deck of playing cards. What is the probability that the card is a
heart or an ace?

Solution

Let A be the event of getting a heart, and B be the event of getting an ace.

The probability that the card is a heart or an ace is P( A  B) .

19
P( A  B)  P( A)  P( B)  P( A  B)

13 4 1 16 4
    
52 52 52 52 13

For mutually exclusive events, P( A  B)  P( A)  P( B)

What is the probability of getting a total of '7' or '11' when pair of dice are tossed?

Solution

Total number of possible outcomes = (6)(6) = 36

Possible outcomes of getting a total of '7' :{1,6; 2,5; 3,4; 4,3; 5,2; 6,1}

Possible outcomes of getting a total of '11' : {5,6; 6,5}

Let A be the event of getting a total of '7', and B be the event of getting a total of '11'.

The probability of getting a total of '7' or '11' is P( A  B) .

P( A  B)  P( A)  P( B)  P( A  B)  P( A)  P( B) ...A and B are mutually exclusive

6 2 2
  
36 36 9

20
CHAPTER TWO
2. Conditional Probability and Independence
2.1 Conditional Probability
Conditional Events: If the occurrence of one event has an effect on the next occurrence of the other
event then the two events are conditional or dependant events.

Let A and B be any two events associated with a random experiment. The probability of occurrence of
event A when the event B has already occurred is called the conditional probability of A when B is
given and is denoted as P (A/B).

P A  B 
It is defined as P A B  
P B 
, where P(B)>0

Similarly the conditional probability of even B when event A already occurred is given by
P A  B 
P  B A 
P  A
, where P (A)>0. This implies that

p( A  B)  P( A / B)* P( B)
=P( B / A)* P( A)

Remark: (1) p( A' B)  1  p( A B)

(2) p( B ' A)  1  p( B A)

(a) P  A B  for fixed A , satisfies the following various postulates of probability.

(1) 0  P( B / A)  1
(2) P(S / A)  1
(3) P( B1  B2 / A)  P( B1 / A)  P( B2 / A), if B1  B2  

(4) P( B1  B2  ....... / A)  P( B1 / A)  P( B2 / A)  ......, if Bi  B j  

(b) If A= S, P( B / S )  P( B  S ) / P(S )  P( B)

21
Remark: we can compute conditional probability using two ways
1. Directly considering the probability of an event with respect to reduced sample space.
2. Using the formula of conditional probability.
Example 1: The probability a policy will correctly formulated is 0.6 and the probability that it
will be correctly formulated and executed is 0.54. Find the probability

a. It will be correctly executed given that it is correctly formulated.


b. It will not be correctly executed given that it is correctly formulated.
Solution: Let C f be correctly formulated and NC f be not correctly formulated.

C E be correctly executed and NC E be not correctly executed.

P (C E )  0.6 , P ( NC E )  0.4 , P(C E  C F )  0.54

P (C C ) 0.54
a. P (C / C )  E F

 0.9
E F P (C ) 0.6
F
b. P( NC / C )  1  P(C / C )  1  0.9  0.1
E F E F

Example 2: If a couple has planned to have two children,


(a) What is the probability that both are girls?

(b) What is the probability that both are girls, if the eldest is a girl?

Solution: (a) The sample space is given by


S = {(M,M ), (M,F ), (F, M ), (F,F )}
and N = 4, the number of sample points in S. Define
A1 = {1st born child is a girl},
A2 = {2nd born child is a girl}.

1
Clearly, A1  A2 = {(F, F)} and p( A2  A1 )  assuming that the four outcomes in S are equally
4
likely.

Solution: (b) now, we want P( A2  A1 / A1 ) = P( A2 / A1 ) = applying the definition of conditional

1
P( A1  A2  A1 ) P( A1  A2 ) 1
probability, we get P( A2  A1 / A1 )    4
P( A1 ) P( A1 ) 2 2
4

22
Example 3: In a certain community, 36 percent of the families own a dog, 22 percent of the families
that own a dog also own a cat, and 30 percent of the families own a cat.

A family is selected at random.

(a) Compute the probability that the family owns both a cat and dog.

(b) Compute the probability that the family owns a dog, given that it owns a cat.

Solution: Let C = {family owns a cat} and D = {family owns a dog}. From the problem, we are
given that P (D) = 0.36, P(C / D)  0.22 and P (C) = 0.30.

P(C  D) P(C  D)
(a) P(C / D)   0.22  0.22   P(C  D)  0.22*0.36  0.0792
P( D) 0.36

P(C  D) 0.0792
(b) P( D / C )    0.264
P(C ) 0.30

Example 4: suppose that an office has 100 calculating machines. Some of these machines are electric
(E) while others are manual (M). And some of the machines are new (n) while others are used (U).the
table below gives the number of machines in each category. A person enters the office, picks a machine
at random, and discovers that it is new. What is the probability that it is electric?

E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
Solution:
40
Simply considering the reduced sample space N(e.g., the 70 new machines), we have P( E / N ) 
70
40
P( E  N ) 4
Using the definition of conditional probability we have that P( E / N )   100 
P( N ) 70 7
100

23
Example 5: Assume 100 students were asked, "Do you smoke?" Responses are shown below in the
contingency table which gives the cross tabulations.
. Yes No Total

Male 19 41 60

Female 12 28 40

Total 31 69 100

What is the probability of a randomly selected individual being a male and who smokes? This is
just a joint probability. The number of "Male and Smoke" divided by the total = 19/100 = 0.19

What is the probability of a randomly selected individual being a male? This is the total for male
divided by the total = 60/100 = 0.60. Since no mention is made of smoking or not smoking, it includes
all males.

What is the probability of a randomly selected individual smoking? Again, since no mention is
made of gender, this is a marginal probability, the total who smoke divided by the total = 31/100 = 0.31.

What is the probability of a randomly selected male smoking? This time, you are told that you have
a male, which is a conditional probability. You are being asked: given that you select a male, what is
the probability of a smoker? Thus, 19 males smoke out of 60 males, so 19/60 = 0.31666.

What is the probability that a randomly selected smoker is male? This time, you are told that you
have a smoker and asked to find the probability that the smoker is also male. There are 19 male smokers
out of 31 total smokers, so 19/31 = 0.6129.
2.2 Multiplication theorems of probability
P  A  B P  A  B
P  A B  , P  B A   P  A  B  P  A B* P B
P  B P  A
= P  B A * P  A
Example: a lot consists 20 defective and 80 non-defective items. If we choose two items at
random without replacement, what is the probability that both items are defective?

Let A = {the first item is defective} B = {the second item is non-defective}

P  A  B  =?

24
20 1 19 1 19 19
P  A  B  = P  B A * P  A , P( A)   , P( B / A)   P( A  B)  P( A)* P( B / A)  * 
100 5 99 5 99 495
Let A1 , A2 ,.............., An be a sequence of events ,then
 n 
P  Ai   P( A1 )* P( A 2 / A1 )* P( A2 )* P( A 2 / A1 , A2 )*.......* P( An / A1 ,....., An 1 )
 i 1 
Example 1: I am dealt a hand of 5 cards. What is the probability that they are all spades?

Solution: Define Ai to be the event that card i is a spade (i = 1, 2, 3, 4, 5). Then,

13
P ( A1 ) 
52
12
P ( A2 / A1 ) 
51
11
P ( A3 / A2 , A1 ) 
50
10
P ( A4 / A3 , A2 , A1 ) 
49
9
P ( A5 / A4 , A3 , A2 , A1 ) 
48

 P( A1  A2  A3  A4  A5 )  P( A1 ) * P( A2 / A1 ) * P( A3 / A2 , A1 ) * P( A4 / A3 , A2 , A1 ) * P( A5 / A4 , A3 , A2 , A1 )
13 12 11 10 9
 * * * *  0.0005
52 51 50 49 48

Note: As another way to solve this problem, a student recently pointed out that we could simply
 13 
regard the cards as belonging to two groups: spades and non-spades. There are   ways to draw
5 
 52 
5 spades from 13. There are   possible hands. Thus, the probability of drawing 5 spades
5 
13 
 
(assuming that each hand is equally likely) is  5   0.0005
 52 
 
5 

Example 2: The probability that a married man watches a certain TV show is 0.4 and the probability
that a married woman watches the show is 0.5. The probability that a man watches the show given that
his wife does is 0.7. Find the probability that

(i) A married couple both watch the show


(ii) A wife watches if her husband does

25
(iii) At least one person of a married couple watches the show
Answer Let H = {husband watches show}
W = {wife watches show}

(i) P(H  W)  P(H | W)  P(W)

 0.5  0.7
 0.35

(ii) P( W | H)  P(H  W)
P( H)

0.35
=
0 .4
= 0.875

(iii) P(H  W)  P(H)  P(W)  P(H  W) = 0.4 + 0.5 – 0.35 = 0.55

Example 3: In a certain college, 25% of the students failed in mathematics, 15% of the students failed in
chemistry, and 10% of the students failed both in mathematics and chemistry. A student is selected at
random.

(i) If he failed in chemistry, what is the probability that he failed in mathematics?

(ii) If he failed in mathematics, what is the probability that he failed in chemistry?

(iii) What is the probability that he failed in mathematics or chemistry?

Let M = {students who failed in mathematics} and C = {students who failed in chemistry}; then

P(M )  25%  0.25, P(C)  15%  0.15, P(M  C)  10%  0.10

(i) The probability that a student failed mathematics, given that he has failed in chemistry is

P( M  C ) 0.10 2
P( M / C )   
P(C ) 0.15 3

(ii) The probability that a student failed in chemistry, given that he has failed in mathematics is

P(C  M ) 0.10 2
P(C / M )   
P( M ) 0.25 5

(iii) P(MC) = P(M) + P(C) - P(MC) =0 .25 + 0.15 – 0.10 = 0.30

26
2.3 Law of Total Probability and Bayes’ Rule
A sequence of sets B1, B2 , ……………….,Bk is said to form a partition of the sample space S if

(a) B1B2 ………Bk = S (exhaustive condition), and

(b) Bi Bj = Ø for all i≠ j (disjoint condition).


(c) P(Bi) > 0

2.3.1 Total Probability Theorem


Definition (total probability theorem): - Let B1, …,Bn be n mutually exclusive events, whose union
gives the sample space S. Hence, the events B constitute a partition of S. For any event A, a subset of S,
we have
B1 B2 B3 ......... ……….. …….. …………. Bn

A  B1  A  B2  A  ............  Bn  A  P( A)  P( B1  A)  P( B2  A)  ............  P( Bn  A)
n
 P( A)  P( A / B1 )* P( B1 )  P( A / B2 )* P( B2 )  .........  P( A / Bn )* P( Bn )   P( A / Bi )* P( Bi )
i 1

Example 1: Suppose that a manufacturer buys approximately 60 percent of a raw material (in boxes)
from Supplier 1, 30 percent from Supplier 2, and 10 percent from Supplier 3. For each supplier,
defective rates are as follows: Supplier 1: 0.01, Supplier 2: 0.02, and Supplier 3: 0.03. The manufacturer
observes a defective box of raw material. What is the probability of defective?

n
Solution: Let A = {observe defective box}, then P( A)   P( A / B ).P( B )
i 1
i i

Let B1, B2, and B3, respectively, denote the events that the box comes from Supplier 1, 2, and 3. The
prior probabilities (ignoring the status of the box) are

P (B1) = 0.6
P (B2) = 0.3
P (B3) = 0.1

27
3
P( A)   P( A / Bi ).P( Bi )  P( A / B1 ) P ( B1 )  P( A / B2 ) P ( B2 )  P( A / B3 ) P( B3 )
i 1

=(0.01)(0.6) + (0.02)(0.3) + (0.03)(0.1)= 0.015

Example 2: A certain item is manufactured by three factories say 1, 2 and 3. It is known that 1 turn
out twice as many as items as 2 and that 2 and 3 turn out the same number of items (during a specified
production period). It is also known that 2 percent of the items produced by 1 and 2 while 4 percent of
those manufactured by 3 are defective. All the items produced are put into one stockpile and then one
item is chosen at random. What is the probability that this item is defective?

Solution: let us introduce the following events:

A= {the item is defective} B1= {the item came from 1}

B2 = {the item came from 2} B3= {the item came from 3}

P( A)  P( A / B1 ) P ( B1 )  P( A / B2 ) P ( B2 )  P( A / B3 ) P( B3 )

1 1
P( B1 )  , P( B2 )  P( B3 )  , P( A / B1 )  P( A / B2 )  0.02, P( A / B3 )  0.04
2 4
1 1 1
P( A)  P( A / B1 ) P ( B1 )  P( A / B2 ) P ( B2 )  P( A / B3 ) P( B3 )  0.02*  0.02*  0.04*  0.025
2 4 4

2.3.2 Bayes’ theorem


Suppose that B1, B2,……….,Bk form a partition of S , and suppose that P (A) > 0 and P (Bi) > 0
for all i = 1, 2,…..,k . Then,

P( A / Bi ).P( Bi ) .This is called Bayes’ formula.


P( Bi / A)  k

 P( A / B ).P( B )
i 1
i i

REMARK: Bayesians call P (Bj) the prior probability for the event Bj; they call P (Bj/A) the posterior
probability of Bj, given the information in A.

Example: : Suppose that a manufacturer buys approximately 60 percent of a raw material (in boxes)
from Supplier 1, 30 percent from Supplier 2, and 10 percent from Supplier 3. For each supplier,
defective rates are as follows: Supplier 1: 0.01, Supplier 2: 0.02, and Supplier 3: 0.03. The manufacturer
observes a defective box of raw material.

28
(b) What is the probability that it came from Supplier 2?

(c) What is the probability that the defective did not come from Supplier 3?

Solution:
(a) Let B1, B2, and B3, respectively, denote the events that the box comes from Supplier 1, 2, and 3.
the prior probabilities (ignoring the status of the box) are

P (B1) = 0.6

P (B2) = 0.3

P (B3) = 0.1

Note that {B1, B2, B3} partitions the space of possible suppliers. Thus, by Bayes Rule,

P ( A / B2 ) P ( B2 ) P( A / B2 ) P ( B2 )
P( B2 / A)  
P( A) P( A / B1 ) P ( B1 )  P( A / B2 ) P ( B2 )  P( A / B3 ) P( B3 )
(0.02)(0.3)
  0.40
(0.01)(0.6) + (0.02)(0.3) + (0.03)(0.1)

This is the updated (posterior) probability that the box came from Supplier 2 (updated to include
the information that the box was defective).

(b) First, compute the posterior probability P (B3/A). By Bayes Rule,

P ( A / B3 ) P ( B3 ) P( A / B3 ) P ( B3 )
P( B3 / A)  
P( A) P( A / B1 ) P ( B1 )  P( A / B2 ) P ( B2 )  P( A / B3 ) P( B3 )
(0.03)(0.1)
  0.20
(0.01)(0.6) + (0.02)(0.3) + (0.03)(0.1)

__
Thus, P( B3 / A)  1  P( B3 / A)  1  0.20  0.80 , by the complement rule.

2.4 Independence
When the occurrence or non-occurrence of A has no effect on whether or not B occurs, and vice versa,
we say that the events A and B are independent. Mathematically, we define A and B to be independent if
and only if P( A  B)  P( A)* P( B) , otherwise, A and B are called dependent events. Note that if A and
B are independent,

29
P ( A  B) P( A) P ( B)
P( A / B)    P( A)
P( B) P( B)
P ( A  B) P( A) P ( B)
P( B / A)    P( B)
P( A) P( A)

Example: A red die and a white die are rolled. Let A = {4 on red die} and B = {sum is odd}. Of the 36
outcomes in S, 6 are favorable to A, 18 are favorable to B, and 3 are favorable to AB. Assuming the
outcomes are equally likely,

red die 1 2 3 4 5 6
white die

1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

3 6 18 3 3
P( A  B)  , P( A)* P( B)  *   P( A  B)  P( A)* P( B)  , the events A and B are
36 36 36 36 36
independent.
Many experiments consist of a sequence of n trials that are viewed as independent (e.g., flipping a coin
10 times). If Ai denotes the event associated with the ith trial, and the trials are independent, then

n n
P( Ai )   P( Ai )
i 1 i 1

Example: An unbiased die is rolled six times. Let Ai = {i appears on roll i}, for i = 1, 2,……….,6. Then,
6
1 6
1
P( Ai )  , and assuming independence, P( A1  A2  A3  A4  A5  A6 )   P( Ai )   
6 i 1 6

Suppose that if Ai occurs, we will call it "a match." What is the probability of at least one match in the
__
six rolls? Solution: Let B denote the event that there is at least one match. Then, B denotes the event
6
__ __ __ __ __ __ __ 6 __
5
that there are no matches. Now, P( B )= P( A1  A2  A A A A)   P( Ai )     0.335
i 1 6
__
P( B)= 1  P( B )  1  0.335= 0.665, by the complement rule.

30
Note: pair wise independent does not necessary imply the independence of several events i.e.
P( A  B)  P( A)* P( B), P( A  C )  P( A)* P(C ), P( B  C )  P( B)* P(C ) but P( A  B  C )  P( A)* P( B)* P(C)

Definition: we say that the three events A, B and C are mutually independent if and only if all the
following conditions hold:

P (AB) = P (A).P (B), P (AC) = P (A).P(C)

P (BC) = P (B).P(C), P (ABC) = P (A).P (B).P(C)


Definition: Let A1, A2,……………….., An are mutually independent events if and only if the joint
occurrence of every finite number of events equals the product of their probabilities.

i.e. P (AiAj) = P (Ai).P (Aj), ∀i≠j

P (AiAjAk) = P (Ai).P (Aj).P(Ak), ∀i≠j

P (AiAj………An-1) = P (Ai).P (Aj), ∀i≠j≠,………,n-1.This implies that there are 2n-n-1 conditions
should be satisfied.

Example: Suppose that we toss two dice. Define the events A, B, C as follows:
A = {the first die shows an even number}
B = {the second die shows an odd number}
C = {the two dice show both odd or both even numbers}

die 1 1 2 3 4 5 6
die 2

1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

We have P (A) = P (B) = P(C) = 0.5. Furthermore, P (AB) = P (AC) = P (BC) = 0.25. Hence, the
three events are all pair wise independency. However, P (ABC) = 0 ≠ P(A).P(B).P(C)
31
1
Example: The probability that a man will live 10 more years is , and the probability that his wife ill
4
1
live 10 more years is . Find the probability that (i) both will be alive in 10 years, (ii) at least one will
3
be alive in 10 years; (iii) neither will be alive in 10 years, (iv) only the wife will be alive in 10 years.
Let A = event that the man is alive in 10 years, and B = event that his wife is alive in 10 years; then

1 1
P( A)  , P( B) 
4 3

1 1 1
i. We seek P(AB). Since A and B are independent, P(AB) = P(A)*P (B) = * 
4 3 12

1 1 1 1
ii. We seek P(AB). P(AB) = P(A) +P(B) - P(AB) =    
4 3 12 2

1 3 1 2
iii. We seek P(AcBc). Now P(Ac) = 1 - P(A) = 1   and P(Bc) =1-P(B) = 1   . Furthermore,
4 4 3 3
3 2 6 1
since Ac and Bc are independent P(AcBc) = P(Ac) P(Bc) = *   .
4 3 12 2

1 1
Alternately, since (AB) c = AcBc, P(AcBc) = P((AB) c) = 1 - P(AB) = 1  
2 2

1 3
iv. We seek P(AcB). Since P(Ac) = 1- P(A) = 1   and Ac and B are independent,
4 4

3 1 1
P(AcB) = P(Ac)*P(B) = * 
4 3 4

32
CHAPTER THREE

3. One-dimensional Random Variables

Definition 1: Let  be an experiment and S a sample space associated with the experiment. A function
X assigning to every to every element s∊S a real number X(s) is called random variable.

Example: suppose that we toss two coins and consider the sample space associated with this
experiment. That is S = {HH, HT, TH, TT}

Let X is the number of heads obtained in the two tosses. Hence X (HH) = 2, X (HT) = X (TH) =1 and X
(TT) = 0

Rx = {0, 1, 2}, s = sample space of e Rx = all possible values of x

s X(s)

Let e be an experiment and S be its sample space. Let X be a random variable defined on S and let Rx be
its range space, let B be an event with respect to Rx i.e. B⊆ Rx. Suppose that A is defined as A= { s∊S/
X(s)∊B},then A and B are called equivalent events. This implies that P (A) = P (B). Random variable
can be discrete or continuous.

Example: Consider the tossing of two coins. Hence S = {HH, HT, TH, TT}. Let X is the number of
heads obtained in the two tosses. Hence Rx = {0, 1, 2}. Let B = {1}. Since X (HT) = X (TH) = 1, we
1 1 1
have that A ={HT, TH} is equivalent to B. We have P (HT)=P (TH) = 1 .Hence P (HT,TH) =   .
4 4 4 2
1
Since the event {X = 1} is equivalent to the event {HT,TH}, we have that P(X=1) =P (HT, TH)  , this
2
implies that A and B are equivalent events.

3.1 Discrete random variable


Definition: Let X be a random variable. If the number of possible values of X (that is, Rx, the range
space) is finite or countable infinite, we call X a discrete random variable. That is, the possible values of
X may be listed as x1, x2,………………, xn. In the finite case, the lists terminate and in the countable
finite case the lists continuous indefinitely.

33
Definition: Let X be a discrete random variable. Hence Rx, the range space of X, consists of at most a
countable infinite number of values x1, x2,…….., with each possible outcome xi we associate a number
P(xi) = P(X = xi) , called the probability of xi. The numbers P (xi), i=1, 2, 3,………., must satisfy the
following conditions.

a. P  xi   0 , for all i


b.  Px  1
i 1
i

The function p defined above is called the probability function (point probability function) of the
random variable X. The collection of pairs (xi, P(xi)),i = 1, 2, 3,……….., is sometimes called the
probability distribution of X.

Example: suppose that we toss two coins and consider the sample space associated with this
experiment. That is S = {HH, HT, TH, TT}

Let X is the number of heads obtained in the two tosses. Hence X (HH) = 2, X (HT) = X (TH) =1 and X
2
(TT) = 0, Rx = {0, 1, 2}, P(X = 0) = 1 , P( X  1)  1 , P( X  2)  1 ,  P( X i ) P( X  0)  P( X  1)  P( X  2)  1
4 2 4 i 1

X 0 1 2

P(X = xi) 1 1 1
4 2 4

Example : Suppose that a radio tube is inserted into socket and tested. Assume that the probability that
3 1
it tests positive equals ; hence the probability that it tests negative is . Assume further that we are
4 4
testing a large supply of such tubes. The testing continues until the first positive tube appears. Define the
random variable X as follows: X is the number of tests required to terminate the experiment. The sample
space associated with this experiment is

S  {, ,   ,   ,.........}.


To determine the probability distribution of X we reason as follows. The possible values of X are
1, 2,.., n, (we are obviously dealing with the idealized sample space). And
X  n if and only if the first (n 1) tubes are negative and the nth tube is positive. If we suppose that
the condition of one tube does not affect the condition of another, we may write
34
n 1
1 3
P ( n)  P ( X  n)      , n  1, 2,.........
4 4

To check that these values of P(n) satisfy the condition we note that

 

3 1 1  3 1 

n 1
P ( n)   1  
4 4 16
 ........     1
 4  1 1 
 4

3.1.1 Binomial random variables


Definition: 1. A Bernoulli trial is a trial (experiment) with any two possible outcomes, success, and
failure.

Example: Toss a coin once. S = {H, T}, let X = number of head, then X follows a Bernoulli distribution
( X ~ B(1,0.5)) , if and only if the possible values of X are 0 and 1 with P(X=1) = p and P(X= 0) =1-p = q
n
⇒   p x q1 x , x  0,1
 x

Consider n independent Bernoulli random variables, all with the same success probability p . The
sum X  X 1   Xn is called a binomial random variable with parameters n and p , then
X ~ Bin(n, p))

A discrete random variable x is said to have a binomial distribution if x satisfies the following
conditions:

 An experiment is repeated for a fixed number of trials n.


 All trials of the experiment are independent from one another.
 All possible outcomes for each trial of the experiment can be divided into two
complementary events one S called “success” and one F called “failure”.
 The probability of success P( S ) has a constant value of p for every trial and the

probability of failure P( F ) has a constant value of q for every trial.


Note: q  1  p

Examples:

 Consider the experiment of flipping a coin 5 times. If we let the event of getting tails on a flip be
considered “success”, and if the random variable T represents the number of tails obtained, then
1 1
T will be binomially distributed with n  5 , p  , and q  .
2 2

35
 A student takes a 10 question multiple-choice quiz and guesses each answer. For each question,
there are 4 possible answers, only one of which is correct. If we consider “success” to be getting
a question right and consider the 10 questions as 10 independent trials, then the random variable
1
X representing the number of correct answers will be binomially distributed with n  10 , p  ,
4

and q  3 .
4
 Fourteen percent of flights from Tampa International Airport are delayed. If 20 flights are chosen
at random, then we can consider each flight to be an independent trial. If we define a successful
trial to be that a flight takes off on time, then the random variable z representing the number of
on-time flights will be binomially distributed with n  20 , p  .86 , and q  .14 .
 Suppose that items coming off a production line are classified as defective (D) or non-defective
(N). Suppose that three items are chosen at random from a day’s production and are classified
according to this scheme. The sample space for this experiment ,say S may be described as:
S= {DDD, DDN, DND, NDD, NND, NDN, DNN, NNN}
3.1.1.1 Calculating Probabilities for a Binomial Random Variable
If X is a binomial random variable with n trials, probability of success p, and probability of failure q,
then by the Fundamental Counting Principle, the probability of any outcome in which there are x
successes (and therefore n  x failures) is:

( p  p  ...  p)  (q  q  ...  q)  p x q n  x
x successes n x failures
To count the number of outcomes with x successes and n  x failures, we observe that the x successes
could occur on any x of the n trials. The number of ways of choosing x trials out of n is n Cx , so the
probability of x successes becomes:

P( x )  n Cx p x q n  x

Example 1: If a coin is flipped 10 times, what is the probability that it will fall heads 3 times?

Solution: Let S denote the probability of obtaining a head, and F the probability of obtaining a
tail.
Clearly, n = 10, k = 3, p = 1/2, and q = 1/2.

Therefore, Bin (10, 3; 1/2) = 10C3 (1/2)3(1/2)7 = .1172

36
Example 2: If a basketball player makes 3 out of every 4 free throws, what is the probability that he will
make 6 out of 10 free throws in a game?

Solution: The probability of making a free throw is 3/4. Therefore, p = 3/4, q = 1/4, n = 10,
and k = 6.
Therefore,

Bin (10, 6; 3/4) = 10C6 (3/4)6(1/4)4 = .1460

Example 3: If a medicine cures 80% of the people who take it, what is the probability that of the eight
people who take the medicine, 5 will be cured?

Solution: Here p =.80, q = .20, n = 8, and k = 5.


Bin (8, 5; .80) = 8C5 (.80)5(.20)3 = .1468

Example 4: If a microchip manufacturer claims that only 4% of his chips are defective, what is the
probability that among the 60 chips chosen, exactly three are defective?

Solution: If S denotes the probability that the chip is defective, and F the probability that the
chip is not defective, then p = .04, q = .96, n = 60, and k = 3.
Bin (60, 3; .04) = 60C3 (.04)3(.96)57 = .2138

Example 5: If a telemarketing executive has determined that 15% of the people contacted will purchase
the product, what is the probability that among the 12 people who are contacted, 2
will buy the product?

Solution: If S denoted the probability that a person will buy the product, and F the probability
that the person will not buy the product, then p = .15, q = .85, n = 12, and k = 2.
Bin (12, 2, .15) = 12C2 (.15)2(.85)10 = .2924.
Example 6: A new medication gives 5% of users’ on undesirable reaction. If a sample of 13 users
receive the medication, find the probability of (assume independent from user to user)

a. Zero undesirable reaction.


b. At most 12 undesirable reaction.
c. At least 1 undesirable reaction.

37
Solution: n  13, p  0.05, q  0.95

X: the number of users on undesirable reaction ⇒ X ~ Bin 13,0.05

a. p( X  0)  13 C0 (0.05)0 (0.95)13  (0.95)13  0.5133421

P( X  12)  P( X  0)  P( X  1)  ...............  P( X  12)  1  P( X  13)


b.
=1  P( X  13)  1  13 C13 (0.05)13 (0.95) 0  1  (0.05)13  1

c. P( X  1)  1  P( X  0)  1  0.5133421  0.4866579
Example 7: Suppose that a radio tube inserted into a certain type of set has a probability of 0.2 of
functioning more than 500 hours. If we test 20 tubes, what is the probability that exactly k of these
function more than 500 hours, k = 1, 2, …………….., 20?

If X is the number of tubes functioning more than 500 hours, we shall assume that X has a binomial

 20 
distribution. Thus P ( X  k )   (0.2) k (0.8) 20  k
k 

3.2 Continuous random variables


Definition: X is said to be a continuous random variable if there exists a function f called the
probability density function ( pdf ) of x, satisfying the following conditions:

a. f ( x)  0, for all i

b. 

f ( x)dx  1

b
c. For any a, b with -   a  b  , we have P(a  X  b)   f ( x)dx
a

a
Note: 1. P( X  x)   f ( x)dx  0
a

2. For continuous random variable X, if X may assume all values in some interval  c, d  , with the

associated probability density function f ( x) , the following probabilities are all the same

38
d
P(c  X  d )  P(c  X  d )  P(c  X  d )   f ( x)dx
c

2 x, 0  x  1
Example 1: Let X be a continuous with pdf f ( x)  
0, otherwise
a. Verify that f ( x) is proper pdf

b. Find the 1 3
P  X  
2 4

c. Evaluate  1 1 2
P X  |  X  
 2 3 3

Solution:
a. i. P( X )  0, x  (0,1)
1
1 1
  x2 
ii.  f ( x)dx   2 x   2 
 2
  1  0  1
0
0 0 
3
3 3 4
  x2  9 1 94 5
b. P   X   
4 4
1 3
2 4  f ( x)dx   2 x   2      
 2  16 4 16

16
1
2
1
2
 1
2

1
2

   2 xdx
2 P 3  X  2
1 1 5
c.  1 1 1
36  5
P  X  |  X     3

 2 3 3 P 1  X  2   2
3 1 12
 2 xdx
3 3 3
1
3

Example 2: let X be the continuous random variable with the following pdf


 ax, 0  x  1


fX  x   a,1  x  2
 ax  3a, 2  x  3


0, otherwise

a. Determine the value of a

b. Evaluate P( X  3 )
2

39
Solution:
 1 2 3



f ( x)dx   axdx   adx   (ax  3a )dx 1
0 1 2
a. 1 3
 ax 2   ax 2 3ax 2  1
   ax 1  
2
    1, a 
 2 0  2 2 2 2

3 2 3
b. P( X  3 )  f ( x)dx  1 dx  ( 1 x  3 )dx  1
2 
3

3 2
2 2 2 2
2 2

Example 3: Let X be the life length of a certain type of light bulbs (in hours). Assuming X to be
a continuous random variable .we suppose that the pdf of X is given by

a
 , 1500  x  2500
f ( x)   x 3

0, elsewhere

That is, we are assigning probability zero to the events ( {X  1500} and {X>2500} ). To evaluate


the constant a we invoke the condition 



f ( x)dx 1 which in this case becomes

2500 a
 dx  1. From this we obtain a  7 , 31 and 250
1500 x 3

Exercise: Let X be a continuous random variable with pdf given by

 x, 0  x  1

f X  x   2  x,1  x  2 Evaluate P  1  X  3 |  1  X  2  
2 2 4 
 0, otherwise

3.3 Cumulative distribution function


Definition: Let X be a random variable ,discrete or continuous .we define F to be the cumulative
distribution function of the random variable X (abbreviated as cdf ) where F ( x)  P( X  x).

Theorem: (a) If X is discrete random variable, F ( x)   P( x j ) where the sum is taken over all
j

indices j satisfying x j  x

x
(b)If X is a continuous random variable with pdf f , F ( x)   f ( x)dx


40
Example 1: suppose the random variable X assume the three values 0, 1, and 2 with probabilities
1 1 1 respectively .then
, , and
3 6 2

0, if x  0
1
 , if 0  x  1
3
F ( x)  
 1 , if 1  x  2
2
1, if x  1

Example 2: Suppose that X is a continuous random variable with pdf

2 x, if 0  x  1
f ( x)  
0, elsewhere

0 , if  0
x
Hence the cdf F is given by F ( x)   2 xdx  x 2 ,if 0  x  1 (draw the graph for this function)

0
1,if 0  1

Note: (a) If X is a discrete random variable with finite number of possible values, the graph of
cdf F will be made up of horizontal line segments (it is a set function). The function F is
continuous except at possible values of X, namely x1, x2, …………..,xn. At the values xj the
graph will have a “jump” of magnitude P( x j )  P( X  x j ) .

(b) If X is a continuous random variable, F will be a continuous function for all x.

(c) The cdf F is defined for all values of x, which is an important reason for considering it.

There are two important properties of cdf

1. The function F is non-decreasing. That is, if x1  x2 we have F ( x1 )  F ( x2 ) .


2. lim F ( x)  0 and lim F ( x)  1. [We often write this as F(-)  0, F ()  1]
x  x 

Theorem: (a) Let F be the cdf of a continuous random variable with pdf f , then

d
f ( x)  F ( x ), x at which F is differentiable.
dx

41
(b)Let X be a discrete random variable with possible values x1 , x 2 , ………….. and

suppose that it is possible to label these values so that x1  x 2  ………….. .Let F be the
cdf of X, then P( x j )  P( X  x j )  F ( x j )  F ( x j 1 )

Remark: P(a  x  b)  F (b)  F (a)


Example: Suppose that the continuous random variable has cdf F given by

0, x  0
F ( x)   x
1  e , x  0

Then f ( x)  d F ( x)  e  x for x  0 ,and thus the pdf f is given by


dx


e  x , x  0
f ( x)  

0, elsewhere
2 2
P ( 3  x  2)  F ( 2)  F ( 3)  (1  e )  0 1 e

3.4 Mixed distributions

The random variable X may assume certain distinct values, say x , x , …………..x
1 2 n
,with

positive probability and also assume all values in some interval ,say a  x  b . The
probability distribution of such a random variable would be obtained by combining the ideas
considered above for the description of discrete and continuous random variables as follows. To
n
each value xi assign a number P( xi ) such that P( xi )  0 all i and such that  P ( x )  p  1.
i i
b
Then define a function f satisfying f ( x )  0,  f ( x ) dx  1  p. For all a and b with
a

b
   a  b   , P ( a  X  b )   f ( x ) dx   P ( x )  1  P(  X  )  1
[ i:a xb ] i
a

42
3.5 Uniformly distributed random variables
Definition: Suppose that X is a continuous random variable assuming all values in the

interval[a, b] ,where both a and b are finite. If the pdf of X is given by

 1
 ,a  x  b
f ( x)  b  a

0, elsewhere

We say that X is uniformly distributed over the interval [a, b]

Notes: (a) A uniformly distributed random variable has a pdf which is constant over the interval


of definition. In order to satisfy the condition 



f ( x)dx 1 , this constant must be equal to the

reciprocal of the length of the interval.


(b) A uniformly distributed random variable represents the continuous analog to equally likely
outcomes in the following sense. For any subinterval[c, d ] , where a  c  d  b ,

P(c  X  d ) is the same for all subintervals having the same length That is,

P(c  x  d )  d  c and thus depends only on the length of the interval and not on the
ba
location of that interval.

(c)We can now make precise the intuitive notion of choosing a point P at random on an interval
say [a, b] .By this we shall simply mean that the x-coordinate of the chosen point, say X, is

uniformly distributed over [a, b] .

The cumulative distribution function for uniformly distributed random variable X over the
interval [a, b] will be computed as follows;

0, x  a
x x 1 1 x x xa  x  a
 x 
F ( x)   f ( x)dx   dx   1dx     b  a  F ( x)   b  a , a  x  b
a a b  a b  a a  b  a a 
1, x  b

43
Example 1: A point is chosen at random on the line segment [0, 2] (assume uniform

distribution). What is the probability that the chosen point lies between 1 and 3 ?
2

Let X represents the coordinate of the chosen point, we have that X ~ U (0,1)

1 3 3 3
 ,0  x  2  P(1  x  3 )  2 21 1  2 3 1 3 2 1
f ( x)   2  f ( x)dx   dx   x     
2 2  2  4 2 4 4

0, otherwise
1 1 1

Example 2: The hardness say H, of a specimen of steel is assumed to be a continuous random


variable which is uniformly distributed over [50,70]

i) Find its cdf


Find P(20  h  70)
ii)

1
Solution: i)  ,50  h  70
f ( x )   20

0, otherwise
0, x  50

1
F ( x)   x,50  x  70
 20
1, x  70

ii) P(20  h  70)  F (70)  F (50)  1 (70  20)  50  5


20 20 2

44
CHAPTER FOUR
4. FUNCTIONS OF RANDOM VARIABLES
4.1 Equivalent events
Suppose we know the distribution of X , we may have an interest on determining the distribution
of Y  H ( x ) .
S RX RY

s X(s) H(x) =y

Let  be an experiment and S sample space associated with  . Let X be a random variable
defined on S. Suppose that y  H ( x ) is a real valued function of x .Then Y  H ( x ) is a random
variable since for every s  S a value of Y is determined say y  H [(x )] . RX is the range space
of X and RY is the range space of Y.

Definition: Let C be an event (subset) associated with the range space of Y, RY i.e. C  R Y .

Let B is defined as B  { X | H ( x )  c} then B and C are equivalent events  P ( B )  P (C ) .

Note: (a) B and C are equivalent events if and only if B and C occur together. That is, when B occurs C
also occurs and conversely.

(b) Suppose that A is an event associated S which is equivalent to an event B associated with R X. Then,
if C is an event associated with RY which is equivalent to B, we have that A is equivalent to C.

2
Example : Suppose that H ( x )  x . Then the events B : {X  2} and C : {Y  4 } are equivalent .
2
For if Y  X ,then { X  2} occurs if and only if {Y  4 } occurs, since X cannot assume
negative values in the present context.

Definition: Let X be a random variable defined on sample space S. Let RX be the range space of X. Let

H be a real valued function and consider the random variable Y  H ( X ) with range space R . For any
Y

event C  R , we dine P (C ) as follows P (C )  P[( x  R : H ( X )  C )] .


Y X

45
Example: Let X be a continuous random variable with pdf
 x
f ( x)  e x , x  0 ( A simple integratio n reveals that  e dx  1 ).
0

Suppose that H ( x )  2 x  1, Hence R X  {X | X  0} while RY  { y | y  1}.

Suppose that the event C defined as follows C  {Y  5 }. Now y  5 if and only if 2 x  1  5 which in turn


yields x  2. Hence C is equivalent to B  {X  2 }. Now P(X  2 )   e  x dx 
1 1
 P (Y  5 ) 
2 e2 e2
 Two events B and C are equivalent events.

y= 5 y= 2 x  1

x2
Example: Let X be a continuous random variable with probability density function ( pdf ) f

1, 0  x  1
f ( x)  
0, otherwise
x 1 1
Let Y  e , X  g ( y )  ln y  dx = dy
y
dx 1 dx 1 1
   f ( y)  . f (g ( y ))  1.
dy y dy y

 1 ,1  y  e

h( y )   y

0, otherwise
46
4.2 FUNCTIONS OF DISCRETE RANDOM VARIABLES
If X is a discrete random variable if Y  H ( x ) ,then Y is also a discrete.  X is discrete  possible
values of X are x1 , x2 ,....., xn and the possible values of Y are y  H ( x ), y  H ( X ),.......... yn  H ( xn ) .
1 1 2 2

Some Y is may be the same, i.e. H may not be one to one function.

Example: Suppose the probability function of X is

x -2 0 2

P( x) 1 1 1

3 2 6

Find the probability distribution of


2
a) Y  x

b ) Y  2x  1

Solution:

a) R  {0, 4}
Y
1
P (Y  0 )  P ( H ( x )  0 )  P ( X  0 ) 
2
1 1 1
P (Y  4 )  P ( H ( x )  4 )  P ( X  2, or X  2 )  P ( X  2 )  P ( X  2 )   
3 6 2

 -2 0
Y
P (Y ) 1 1

2 2

1
b) RY  {3,1,5}, P (Y  3)  P ( H ( x )  3)  p ( X  2} 
3
1
P (Y  1)  P ( H ( x )  1)  p ( X  0} 
2
1
P (Y  5  P ( H ( x )  5)  p ( X  2} 
6

47
4.3 Functions of continuous random variables
The most important (and most frequently encountered) case arises when X is a continuous
random variable with pdf f and H is a continuous function. Hence Y  H ( X ) is continuous
random variable and it will be our task to obtain its pdf say g .

The general procedure will be as follows:

a. Obtain G , the cdf of Y , where G ( y )  P(Y  y ) ,by finding the event A (in the range
space of X) which is equivalent to {Y  y} .
b. Differentiate G ( y ) with respect to y in order to obtain g ( y )
c. Determine those values of y in the range space of Y for which g ( y )  0.

Example 1: Suppose that a continuous random variable has pdf given as

2 x , 0  x  1
f ( x)  
0, elsewhere
x
Let H ( X )  e . To find the pdf of Y  H ( X ) we proceed as follows

X 1
G ( y )  P (Y  y )  P ( e  y )  P ( X   ln y )  2
 2 xdx  1  (  ln y )
 ln y

Hence g ( y )  dG ( y )   2 ln y .
dy y

1
Since f(x)  0 for 0  x  1 , we find that g(y)  0 for  y  1.
e

  2 ln y 1
 ,  y 1
Therefore g ( y)   y e
0, elsewhere

Theorem: Let X be a continuous random variable with pdf f where f (x)  0 for a  x  b .
Suppose that y  H(x) is a strictly monotone (decreasing or increasing) function of x . Assume
that this function is differentiable (and hence continuous) for all x . Then the random variable Y
defined as Y  H ( X ) has pdf g given by,

48
dx 1 dx
x is expressed in terms of y .
g ( y )  f ( x)  f (H ( y )) Where
dy dy

If H is increasing, then g is nonzero for those values of y satisfying H ( a )  y  H (b) . If H is


decreasing, then g is nonzero for those values of y satisfying H ( a )  y  H ( a ) .

Proof: (a) Assuming that H is strictly increasing function. Hence

1
G ( y )  P (Y  y )  P ( H ( X )  y )  P ( X  H ( y ))

Differenti ating G(y) with respect to y, we obtain, u sin g the chai n rule for derivatives,

dG(y) dG(y) dx -1 ' dF(x) dx dx


 ,where x  H (y)  G (y)   f(x)
dy dx dy dx dy dy

(b) Assume that H is decreasing function. Therefore

1
G ( y )  P (Y  y )  P ( H ( X )  y )  P ( X  H ( y ))

1 1
 1 - P( X  H ( y ))  1  F ( H ( y ))

dG(y) dG(y) dx d dx dx
  (1  F ( x ))   f ( x)
dy dx dy dx dy dy

Thus, by using the absolute value sign and combining (a) and (b)

dx 1 dx
g ( y )  f ( x)  f (H ( y ))
dy dy

Example 2: Suppose X has the probability density function

  ,x 1

f ( x )   x  1
0, otherwise

Where  is a positive parameter. This is an example of a Pareto distribution. We want to find the
density of Y = ln X .As the support of X, i.e. the range on which the density is non-zero, is
x > 1 the support of Y is y > 0 .

49
y dx y
The inverse transformation is x  e and  e . Therefore
dy

dx 1 y   y
f ( y)  f ( H ( y ))  e  e
( e )  1
Y dy y


e  y , y  0
 fY ( y )  

0, otherwise
Example 3: Suppose that Y ~ U (0,1) . Find the distribution of U  g ( y)   ln y

Solution: The cdf of Y ~ U (0,1) is given as

0, y  0

FY ( y )   y,0  y  1
1, y  1

The support for Y ~ U (0,1) is RY  { y : 0  y  1} ; thus, because u   ln y  0 . It follows that the

support for U is RU  {u : u  0} .Using the method of distribution functions, we have

FU (u)  P(U  u)  P( ln Y  u)  P ( ln Y  - u)  P(Y  e u )  1  P(Y  e u )  1  FY (e u )


Notice how we have written the cdf of U as a function of the cdf of Y. Because FY (y)  y for

0 < y < 1, i.e., fo r u > 0, , we have FU (u)  P(U  u)  1  FY (e u )  1  e u Taking

d d
derivatives, we get, for u > 0, f U (u) = FU (u)  (1  e u )  e u
du du

e u , u  0
Summarizing f U (u) =
0, elsewhere

Example 4: Suppose the random variable Y has a pdf

3 y 2 ,0  y  1
f ( y) 
Y 
0, otherwise
Suppose we want to find the pdf of U = 2Y +3 . The range of U is 3 < U < 5. Now

50
u3 u3
3

 u) =P Y
u  3 2 2 u  3
FU (u) = P (U  u) = P ( 2Y + 3    f ( y ) dy   3 y 2 dy   
 2  0 0  2 

0, u  0

 u  3 3
So F (u) 
U  2  ,3  u 5
 
1, u  5

 3 (u  3) 2 ,3 
dFU (u )  u  5
And f (u) 
U du
 8

0, otherwise

51
CHAPTER FIVE
5. Two Dimensional Random Variables
Definition: Let  be an experiment and S the sample space associated with  .Let X=X(s) and
Y=Y(s) be two functions each assigning a real number to each outcomes s  S. Then (X, Y) is called a
two dimensional random variable (sometimes called a random vector). This implies a pair of random
variables defined over a joint sample space.

If X 1  X 1 (s), X 2  X 2 (s),......., X n  X n (s) are n functions each assigning real number to every
outcomes s  S, we call ( X 1, X 2 ,........, X n ) an n-dimensional random variable (or an n-dimensional

random vector)
S

X X(s)
s
Y

Y(s)

Note: As one dimensional case, our concern will be not with the functional nature of X (s) and Y (s) ,
but rather with the values which X and Y assume. We shall again speak of the range space of ( X , Y ) say
R X ,Y , as the set of all possible values of ( X , Y ) . In two dimensional case, for instance, the range space of

( X , Y ) will be a subset of the Euclidean plane. Each outcome X (s) , Y (s) may be represented as a point
( x, y) in the plane. We will again suppress the functional nature of X and Y by writing, for example,
P( X  a, Y  b) instead of P[ X (s)  a, Y (s)  b] .

Definition: If the possible values of (X,Y) are finite or countable infinite, (X,Y) is called a two
dimensional discrete random variable. That is, the possible values of (X, Y) may be represented as
(xi , y j ),i  1,2,..........n,......;j  1,2,........,m,........

If (X, Y) can assume all values in a specified region R in the xy-plane (non-countable set of Euclidian
plane), (X, Y) is called a two dimensional continuous random variable. For example, if ( X , Y ) assumes

all values in the rectangle {( x, y) | a  x  b, c  y  d} or all values in the circle {( x, y) | x 2  y 2  1}

52
Definition: (a) Let ( X , Y ) be a two dimensional discrete random variable with each possible outcome
(xi , y j ) we associate a number P( xi , y j ) representing P( X  xi , Y  y j ) and satisfying the following
conditions:

1. P( xi , y j )  0 for all ( x, y)

 
2.  P( x , y
i 1 j 1
i j ) 1

The function P defined for all (xi , y j ) in the range space of ( X , Y ) is called the probability function of
( X , Y ) .The set of triples (xi , y j , P( xi , y j )), i, j  1,2,.............., is sometimes called the probability
distribution of ( X , Y ) .

(b) Let ( X , Y ) be a continuous random variable assuming all values in some region R of the Euclidean
plane. The joint probability density function f is a function satisfying the following conditions:

3. f ( x, y)  0 for all ( x, y)  R

4.  f ( x, y)dxdy  1
R

Note : If B is in the range space of (X, Y), we have P(B)  P[ (X(s), Y(s))  B)]  P(s | (X(s), Y(s))  B]
 P( B)    P( x , y ), if (X, Y) is discrete , where the sum is taken over all indices
i i (i, j ) for which
B

( xi , y j )  B) and P( B)   f ( x, y )dxdy if ( X , Y ) is continuous.


B

Example 1: Suppose that a machine is used for particular testing the morning and the different task in
the afternoon. Let X and Y represent the number of machine breaks down in the morning and in the
afternoon respectively. The table below gives the joint probability distribution of X and Y.

X /Y 0 1 2

0 0.25 0.15 0.1

1 0.1 0.08 0.07

2 0.05 0.07 0.13

a. What is the probability that the machine breaks down equal number of times in the morning and
in the afternoon?

b. What is the probability that the number of machine break down in the morning is greater than in
the afternoon?

53
Solution:
a. P( X  0, Y  0)  P( X  1, Y  1)  P( X  2, Y  2)  0.25  0.08  0.13  0.46

b. P( X  1, Y  0)  P( X  2, Y  0)  P( X  2, Y  1)  0.15  0.1  0.07  0.32

Example 2: Determine the value of k for which the joint pdf is given as

f ( x, y )  kxy, for x  1,2 ,3

y  1,2 ,3
Solution: The possible combination is
(1,1), (1,2), (1,3), ( 2,1), ( 2, 2), ( 2,3), (3,1), (3, 2), (3,3)
f (1,1)  k (1)(1)  k
f (1, 2)  k (1)( 2)  2 k
f (1,3)  k (1)(3)  3k
f ( 2,1)  k ( 2)(1)  2 k
f ( 2, 2)  k ( 2)( 2)  4 k
f ( 2,3)  k ( 2)(3)  6 k
f (3,1)  k (3)(1)  3k
f (3, 2)  k (3)( 2)  6 k
f (3,3)  k (3)(3)  9 k

1
k  2 k  3k  2 k  4 k  6k  3k  6k  9k  1  k 
36
Example 3: Two production lines manufacture a certain type of item. Suppose that the capacity
(on any given day) is 5 items for line I and 3 items for line II. Assume that the number of items
actually produced by either production line is a random variable. Let ( X , Y ) represents the two
dimensional random variable yielding the number of items produced by line I and line II
respectively. The table given below gives the joint probability distribution of ( X , Y ) . Each entry

represents P ( xi , y j )  P ( X  xi , Y  y j )

54
X /Y 0 1 2 3 4 5

0 0 0.01 0.03 0.05 0.07 0.09

1 0.01 0.02 0.04 0.05 0.06 0.08

2 0.01 0.03 0.05 0.05 0.05 0.06

3 0.01 0.02 0.04 0.06 0.06 0.05

Thus P( 2,3)  P( X  2, Y  3)  0.04

Hence if B is defined as B  {More items are produced by li ne I than line II}

Example 4: suppose ( X , Y ) is a continuous joint probability density function which is given as

c, if 5000  x  10000 and 4000  y  9000


f(x,y)  
0,elsewhere

a) Determine the value of c


b) Let B  {X  Y }, then find P(B)

 
Solution: a) To determine c we use the fact that   f ( x, y ) dxdy  1, therfore
  

  900010,000
  f(x,y)dxdy    f(x,y)dxdy  c[ 5000 ] 2  c  [ 5000 ]  2
   4000 5000

9000 y 9000
1 17
b) P( B)  1  P( X  Y )  1 
(5000) 2  
5000 5000
dxdy  1  
5000
( y  5000)dy 
25

Example 5: Suppose that the two dimensional continuous random variable ( X , Y ) has joint
pdf given by

 2 xy
x  ,0  x  1,0  y  2
f ( x, y )   3

0, otherwise

55
Let B  {X  Y  1 }, then fi nd P(B)
_ _ 11  x  xy  1 x(1  x)2 
Solution: B  {X  Y  1 }  P(B)  1  P( B)  1     x 2  dydx  1    x 2( 1  x)   dx
0 0  3  0 6 
7 65
 1 
72 72

Example 6: The joint pdf of a two dimensional random variable. (X, Y) is given by,

Definition: Let ( X , Y ) be a two dimensional random variable. The cumulative distribution


function (cdf) F of the two dimensional random variable ( X , Y ) is defined by

F ( x, y )  P ( X  x, Y  y )

56
If F is the (cdf) of a two dimensional random variable with joint pdf f , then
2
 F(x,y)
 f(x,y) wherever F is differentiable.
xy

Example: If the joint probability density function of X and Y is given by

 x  y ,0  x  1,0  y 1
f ( x, y )  
0, otherwise
Find the distribution function (cumulative distribution function) of these two random variables.

Solution:
Case 1: x  0, y  0  F ( x, y )  0

yx 1
Case 2: 0  x  1,0  y  1    ( x  y ) dxdy  xy ( x  y )
00 2

y1 1
Case 3: x  1 and 0  y  1    (x  y)dxdy  y ( y  1)
00 2

1x 1
Case 4: x  1 and 0  y  1    (x  y)dxdy  x ( x  1)
00 2
11
Case 4: x  1 and y  1    (x  y)dxdy  1
00

57
0 ,x  0 ,y  0
1
 xy(x  y),0  x  1,0  y  1
2
 1
This implies that F(x,y)   y(y  1 ),x  1 and 0  y  1
2
1
 2 x(x  1 ),0  x  1 and y  1

1,x  1 and y  1
5.2 Marginal and conditional probability

P(X  x)  P{(X  x and Y  y1 ) or (X  xi and Y  y 2 ) or etc.}


i

 Pi1  Pi 2  .........   Pij


j

P  P ( X  xi )   Pij is called the marginal probability function of X.


i* j

It is defined for X  x , x ,........ and denoted as P .The collection of pairs


1 2 i*
( X i , Pi * ), i  1, 2,............. is called the marginal probability distribution of X. Similarly the
collection of pairs ( X j , P* j ), j  1, 2,............. is called the marginal probability distribution of Y,

where P* j   Pij  P (Y  y j )
j

In the continuous case, g ( x )   f ( x, y )dy is called the marginal density of X.




Similarly, h ( y )   f ( x, y)dx is called the marginal density of Y.




d
Similarly, P (c  Y  d )  P (   X  , c  Y  d )   h ( y )dy
c

58
Example 1: The following table represents the joint probability distribution of the discrete
random variable ( X , Y )

Find the marginal distribution of X and Y.

59
Example 2 : Two characteristics of a rocket engine' s performance are thrust X and mixture ratio Y.

Suppose that (X, Y) is a two dimensiona l continuous random variable with pdf :

2(x  y  2 xy),0  x  1, 0  y  1
f(x,y)  
0, otherwise ( The units have been adjusted in order to use values between 0 and 1).
Find the marginal distributions of X and Y.
Solution: Let g(x) and h(y) be the marginal distributions of X and Y respectively, then

1
 y2 
0
1

g(x)   2(x  y  2 xy)dy  2 xy 


 2
 xy2   1,0  x  1
0
|
This implies that X is uniformly distributed over [0, 1]

x 
2

0
1

h(y)   2(x  y  2 xy)dx  2


 2
 xy  x 2 y  1
0
|  1,0  y  1

Y is also uniformly distributed over [0, 1]


Definition: we say that the two dimensional continuous random variable is uniformly distributed over
the region R in the Euclidean plan if

cons tan t, for (x,y)  R


f(x,y)  
0, elsewhere
 

Because of the requirement   f ( x, y ) dxdy  1 , the above implies that the constant equals
 
1
area ( R )

We are assuming that R is a region with finite, nonzero area.


Note: This definition represents the two dimensional analog to the one dimensional uniformly
distributed random variable.

Example 3: Suppose that the two dimensional random variable ( X , Y ) is uniformly distributed
over the shaded region R indicated in the figure below
y

(1, 1)

yx R y  x2

x
60
1 1 2 1
f ( x, y )  , ( x , y )  R , we find that area ( R )   ( x  x ) dx  . Therefore the
area ( R) 0 6
6, ( x, y )  R
pdf is given by: f ( x , y )  
0, ( x, y )  R
Therefore the marginal pdf of X will be

 x 6( x  x 2 ),0  x  1
g ( x )   f ( x, y ) dy   6dy  6( x  x )  g ( x )  
2

 x2 0, otherwise
The marginal pdf of Y will be

 y
6( y  y ),0  y 1
h ( y )   f ( x, y ) dx   6dx  6( y  y )  h ( y ) 


y
0, otherwise
Definition: Let X and Y be two random variables, discrete or continuous. The conditional distribution
of the random variable Y given that X  x is

The above conditional distribution pdf ' s satisfay all the requiremen ts for a one dimensiona l probability distribution

function. Thus for fixed y , we have f ( x / y )  0 and


  f ( x , y ) 1  h( y )
 f ( x | y ) dx   dx   f ( x, y ) dx  1
  h ( y ) h ( y )  h( y )

If we wish to find the probability that the discrete random variable X falls between a and b when it is
known that the discrete variable Y  y , we evaluate

Where the summation extends over all values of X between a and b. When X and Yare continuous, we
evaluate

61
Example 1: The following table represents the joint probability distribution of the discrete
random variable ( X , Y )

a. Find the conditional distribution of X given Y = 2.


b. Find the conditional distribution of Y given X= 3.

Solution: Conditional distribution of X given Y = 2 is given as


a. The conditional distribution of X given Y = 2.

62
b. The conditional distribution of Y given X= 3

Example 2: The joint density for the random variables (X, Y), where X is the unit temperature change
and Y is the proportion of spectrum shift that a certain atomic particle produces, is

63
Example 3: Given the joint density function

64
Example 4: Suppose that the joint density of X and Y is given by

5.3 Independent random variables

65
Example 1: Suppose that the pmf for the discrete random vector (Y 1, Y 2 ) is given by

Thus, the random variables Y1 and Y2 are dependent.

Example 2: Let Y1 and Y2 denote the proportions of time (out of one workday) during which
employees I and II, respectively, perform their assigned tasks. Suppose that the random vector
(Y 1, Y 2 ) has joint pdf

66
Example 3: The joint pdf of the random variable (X, Y) is given by

Find the value of K and show that X and Y are independent.

67
68
5.4 Functions of two dimensional random variables

Let the joint pdf of (X, Y) be f(x,y) . Let U=g1(X, Y); V=g2(X, Y). The mapping from(X, Y) to
(U, V) is assumed to be one-to-one and onto. Hence, there are functions, h1 and h2 such that

Example 1: Let X and Y be independent random variables uniformly distributed on [0, 1]. Find
the distribution of X+Y.

69
Because V is the variable we introduced, to get the pdf of U, we just need to find the marginal pdf

from the joint pdf . From figure below, the regions of integration are

The regions of integration

Graph of
70
e  x ,x  0
Example 2: Let X and Y be independent random variables with common pdf f(x)  
0 ,otherwise
x
Find the joint pdf of U  ,V  x  y
x y

Example 3: If X and Y each follow exponential distribution with parameter 1 and are independent, find
the pdf of U = X – Y.

Solution
Since X and Y are independent random variables following exponential distribution with
parameter 1,
71
72
When the joint density function of then random variables X1 , X 2 ,..., X n is given and we want

to compute the joint density function of Y1, Y2,..., Yn where


Y 1=g 1(X 1,...,Xn), Y 2=g 2(X 1,...,Xn),... , Yn=gn(X 1,...,Xn)

73
Note: Suppose that ( X 1, X 2,......., Xn ) may assume all values in the region of 𝑛 dimensional space. That

is, the value is 𝑛 dimensional vector ( X 1( s ), X 2 ( s ),......., Xn ( s )) . We characterize the probability

distribution of ( X 1, X 2,......., Xn ) as follows.

There exists a joint probability density function 𝑓 satisfying the following conditions:
74
a) f ( x1 , x2 ,....., xn )  0 for all ( x1 , x2 ,....., xn ) .
 

b)  ....... 
 
f ( x1 , x2 ,....., xn ) dx1 ........dxn  1

With the aid of this 𝑝𝑑𝑓 we define

P[( x1 , x2 ,....., xn )  C ]   .......  f ( x1 , x2 ,....., xn ) dx1 ......dxn


C
Example 1: Let X, Y, Z be independent and uniformly distributed over (0, 1). Compute P( X  YZ )

Example 2: If the joint probability mass function of the three discrete random variables X, Y and Z is
(x  y)z
given by f ( x , y , z )  ,for x  1,2 ,3,y  1,2 ,3,z  1,2 . Find P( X  2, Y  Z  3)
63

Solution: Possible combinations for P( X  2, Y  Z  3) are f ( 2,1,1), f ( 2,1, 2), f ( 2, 2,1) ,which implies that
( 2  1)1  ( 2  1) 2  ( 2  2)1 13
P ( X  2, Y  Z  3)  
63 63

Example 3: If the joint probability density function of the three continuous random variables
X 1 , X 2 and X 3 is given by f ( X , X , X )  
( X 1  X 2 )e  X 3 ,0  X 1 1, 0  X 1 1, X 3  0


1 2 3

0, otherwise

 
 
Find P ( X 1 , X 2 , X 3 )  A , where A is the region
1 1
( X 1 , X 2 , X 3 ) | 0  X 1  ,  X 2  1, X 3  1
 2 2 

  11 2
1
1 1  X3
Solution: P ( X 1 , X 2 , X 3 ) | 0  X 1  ,  X 2  1, X 3  1     ( X 1  X 2 ), e dx1 dx 2 dx3  0.156
 2 2  2 0 1 0

75
CHAPTER SIX
6. Mathematical expectations
6.1 Mean of a Random Variable
Definition: Let X be a random variable with probability distribution f (x) . The mean, expected value,
expectation of X is denoted by E ( X ) or µX which is:
 If X is discrete

 If X is continuous

Example 1:
Experiment: Tossing a coin twice.

1. What is probability to get one head?


2. Repeat the experiment 10 times. On average, how many number of heads would there be in each
experiment?

Example 2: A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good
component and 3 defective components. A sample of 3 is taken by the inspector. Find the expected
value of the number of good components in this sample.

Solution: Let X be representing the number of good components in the sample. The probability
distribution of X is

76
Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good components
and 3 defective components, it would contain, on average, 1.7 good components.

Example 3: Let X be the random variable that denotes the life in hours of a certain electronic device.
The probability density function is

Solution:

Theorem: Let X be a random variable with probability function f (x ) . The expected value of the
random variable g (x ) is

 Discrete: if X is discrete

 Continuous: if X is continuous

Example 4: Suppose that the number of cars X that pass through a car wash between 4:00 P.M. and
5:00 P.M. on any sunny Friday has the following probability:

77
Let g ( X )= 2 X  1 represent the amount of money in dollars, paid to the attendant by the manager.
Find the attendant’s expected earnings for this particular time period.

Solution: By theorem, the attendant can expect to receive

Example 5: Let X be a random variable with density function

Find the expected value of g ( x)  4 x  3

Solution:

Definition: Let X and Y be random variables with joint probability distribution f (x, y ) . The mean or
expected value of g ( X, Y ) is:

 If both X and Y are discrete

 If both X and Y are continuous

78
Example 1: Let X and Y be the random variables with joint probability distribution indicated in the
following Table.

Find the expected value of g ( X, Y )= XY .

Solution:

Example 2: Find E  Y 
  for the density function
X

Solution:

6.2 Variance and Covariance of Random Variables


Definition: Let X be a random variable with probability function f(x) and mean µ. The
variance of the random variable X is

 Discrete: if X is discrete

 Continuous: if X is continuous

The positive square root of the variance  is called the standard deviation of X.

  var( X )
79
Example 1: Let the random variable X represent the number of automobiles that are used for
official business purposes on any given workday. The probability distribution for company A is

And for company B

Show that the variance of the probability distribution for company B is greater than that of
company A.

2 2 2
Theorem: The variance of a random variable X is   E ( X )   ,where   E ( X )

Proof: For the discrete case we can write

80
Example 2: Let the random variable X represent the number of defective parts for a machine when 3
parts are sampled from a production line and tested. The following is the probability
distribution of X.

Solution:

Example 3: The weekly demand for Pepsi, in thousands of liters, from a local chain of efficiency stores,
is a continuous random variable X having the probability density

Find the mean and variance of X.

Solution:

Theorem: Let X be a random variable with probability function f (x ) . The variance of the random
variable g ( X ) is

 Discrete: if X is discrete

 Continuous: if X is continuous

81
Example 1: Calculate the variance of g ( X )= 2 X  3 , where X is a random variable with probability
distribution

Solution:

Example 2: Let X be a random variable with density function

Find the variance of the random variable g ( X ) = 4 X + 3

Definition: Let X and Y be random variables with probability distribution f (x,y ) . The
covariance of the random variables X and Y is

 Discrete: if both X and Y are discrete is discrete

82
 Continuous: if both X and Y are continuous

The covariance between two random variables is a measurement of the nature of the association between
the two. If large values of X often result in large values of Y or small values of X result in small values
of Y , positive X − µx will often result in positive Y − µy and negative X − µx will often result in
negative Y − µy. Thus the product (X − µx) (Y − µy) will tend to be positive. On the other hand, if large
X values often result in small Y values, the product (X − µx) (Y − µy) will tend to be negative. Thus the
sign of the covariance indicates whether the relationship between two dependent random variables is
positive or negative. When X and Y are statistically independent, it can be shown that the covariance is
zero. The converse, however, is not general true. Two variables may have zero covariance and still not
be statistically independent. Note that the covariance only describes the linear relationship between two
random variables. Therefore, if a covariance between X and Y is zero, X and Y may have nonlinear
relationship, which means that they are not necessarily independent.

Example 1: Let X be a random variable and the probability distribution of X is

2
Let Y = g ( X ) = X . Then Cov(X, Y ) = 0 , but X and Y have the quadratic relationship.

Theorem: The covariance of two random variables X and Y with means µX and µY, respectively, is
given by

83
Proof:
For the discrete case we can write

Example 2: The fraction X of male runners and the fraction Y of female runners who compete in
marathon races are described by the joint probability density function

Find the covariance of X and Y


Solution: We first compute the marginal density functions:

Definition: Let X and Y be random variables with covariance σXY and standard deviations σX and σY,
respectively. The correlation coefficient of X and Y is

84
 
  XY  XY
XY 2 2  XY
 X  Y

It should be clear to that  is free of the units of X and Y. The correlation coefficient satisfies the
XY
inequality−1≤  ≤1. It assumes a value of zero when σXY = 0. Where there is an exact linear
XY
dependency, say Y= a+bX,   1 if b>0 and   1 if b<0.
XY XY

Example 1: Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red pens,
and 3 green pens. If X is the number of blue pens selected and Y is the number of red pens selected, and
the following is the joint probability distribution:

Find the correlation coefficient between X and Y.

Solution:

85
Example 2: The fraction X of male runners and the fraction Y of female runners who compete in
marathon races are described by the joint probability density function

Find the correlation coefficient of X and Y.


Solution: We first compute the marginal density functions:

86
6.3 Chebyshev's Inequality
MARKOV'S INEQUALITY: Suppose that X is a nonnegative random variable with pdf (pmf) f(x) and
let c be a positive constant. Markov's Inequality puts a bound on the upper tail probability P (X > c); that
is,

87
REMARK: The beauty of Chebyshev's result is that it applies to any random variable Y .In words,

 
P Y    k is the probability that the random variable Y will differ from the mean  by more than

k standard deviations. If we do not know how Y is distributed, we cannot compute P Y    k  


exactly, but, at least we can put an upper bound on this probability; this is what Chebyshev's result
allows us to do. Note that

Example 1: Suppose that Y represents the amount of precipitation (in inches) observed annually in
Barrow, AK. The exact probability distribution for Y is unknown, but, from historical information, it is
posited that  = 4.5 and  = 1. What is a lower bound on the probability that there will be between 2.5
and 6.5 inches of precipitation during the next year?

Example 2: A random variable X has density function given by

88
Solution:

(a) First, we need to find μ:

(b) If we take ε = k = 1 in Chebyshev’s theorem we obtain that

One can show that   0.25 . So an upper bound for the given probability is 0.25 which is quite far
2

from the real value.

89
6.4 Moments and moment generating functions
6.4.1 Moments

'
The rth moment about the origin of a random variable X, denoted by  r , is the expected value of Xr;

symbolically,

Definition:

Definition:

90
For any random variable for which μ exists. The second moment about the mean is of special
importance in statistics because it is indicative of the spread or dispersion of the distribution of a
random variable; thus, it is given a special symbol and a special name.

Definition:

Theorem:

Example:

91
Example:

Definition:

92
Example:

93
Theorem:

Example:

Theorem:

6.5 Conditional Expectation

94
Definition: If we let u(X) = X, we obtain the conditional mean of the random variable X given Y=y,
which we denote by

Definition: Correspondingly, the conditional variance of X given Y= y is

2
Where E ( X | y ) with U ( X )  X 2

Example: Suppose that we have the following joint probability distribution

95
96

You might also like