Probability Theory Lecture Note
Probability Theory Lecture Note
1. Introduction
Experiment: - Is any process of observation or measurement or any process, which generates well-defined
outcome.
- Physical experiment
- Chemical experiment
- Social experiment
In order to perform an experiment we need models. These models are mathematical models, which can be
either deterministic or non-deterministic models
There are two basic types of mathematical models, deterministic and non-deterministic (probability) models.
In deterministic models, the conditions under which an experiment is carried out determine the exact outcome
of the experiment. In deterministic mathematical models, the solution of a set of mathematical equations
specifies the exact outcome of the experiment.
V
Example1: Ohm’s law states that the voltage-current characteristic of a resistor is I . The voltages and
R
currents in any circuit consisting of an interconnection of batteries and resistors can be found by solving a
system of simultaneous linear equations that is found by applying Kirchhoff’s laws and Ohm’s law.
1
If an experiment involving the measurement of a set of voltages is repeated a number of times under the same
conditions, circuit theory predicts that the observations will always be the same. In practice, there will be
some variation in the observations due to measurement errors and uncontrolled factors. Nevertheless, this
deterministic model will be adequate as long as the deviation about the predicted values remains small.
Many systems of interest involve phenomena that exhibit unpredictable variation and randomness. We define
a random experiment to be an experiment in which the outcome varies in an unpredictable fashion when the
experiment is repeated under the same conditions. Deterministic models are not appropriate for random
experiments since they predict the same outcome for each repetition of an experiment. In this section, we
introduce probability models that are intended for random experiments.
Example: - Toss a die and observe the number that shows on top.
- Toss a coin four times and observe the total number of heads obtained.
- From urn containing red and black balls, a ball is chosen and its color noted.
Example: A = {a, o, f, g, h}
Sets are denoted by capital letters and small letters denote elements.
ᴜ
1. Universal set ( ): Is a set of all objects under consideration.
2
1.2.1 Set operations
1. If A⊆B, i.e. x∊A⇒ x∊B, ∀x.
n
ii. A1A2A3……………An= Ai
i 1
Set properties
1. AB = BA, AB = BA
5. DeMorgan’s Laws:
__________ ____ ____
AB A B
__________ ____ ____
i. AB A B ii.
Simple Event: If an event has one element of the sample space then it is called a simple or elementary event.
Let S= {1,2,3,4,5,6}]
If the event is the set of elements less than 2, then E = {1} is a simple event
Compound Event: If an event has more than one sample points, the event is called a compound event. In the
above example of throwing a die, {1, 4} is a compound event.
Example: If a fair die is rolled once, it is possible to list all the possible outcomes i.e.1, 2, 3, 4, 5, 6, but it is
not possible to predict which outcome will occur. Let A be the event of odd numbers, B be the event of even
numbers, and C be the event of number 8.
A 1,3,5
B 2,4,6
C or empty space or impossible event
- Although we are in general not able to state what a particular outcome will be, we are able to describe
the set of all possible outcomes of the experiment.
- As the experiment is repeated a large number of times the definite pattern or regularity appears. It is this
regularity, which makes it possible to construct a precise mathematical model with which to analyze the
experiment.
a. If A and B are events, AB is the event which occurs if and only if A or B or both occur.
b. If A and B are events, AB is the event which occurs if and only if A and B occur.
4
c. If A is an event, A is the event, which occurs if and only if A does not occur.
n
d. If A1, A2, A3,……………,An is any finite collection of events, then Ai is the event which occurs
i 1
n
e. If A1, A2, A3,……………,An is any finite collection of events, then Ai is the event which occurs
i 1
n
f. A1, A2, A3,……………,An…… is any(countable) infinite collection of events, then Ai is the
i 1
event which occurs if and only if at least one of the events Ai occurs.
__________
'
g. A B = (A B) is the event which occurs if and only if A and B does not occur.
a. Experiment
b. Outcomes
c. Sample space
Solution:
a. Tossing a coin
b. HH, HT , TH, TT
Definition: Two events A and B are said to be mutually exclusive (disjoint) if they cannot occur together.
We express this by writing AB = Ø; that is the intersection of A and B is the empty set.
Two events A and B are said to be independent if and only if the occurrence of one does not effect on the
occurrence or non-occurrence of the other event.
5
1.4 Finite Sample Spaces
A finite sample space is a sample space, which consists of a finite number of elements.
The sample space for the experiment of a toss of a coin is a finite sample space. It has only two sample points.
But the sample space for the experiment where the coin is tossed until a heads shows up is not a finite sample
space -- it is theoretically possible that you could keep tossing the coin indefinitely.
Tossing a coin. The experiment is tossing a coin (or any other object with two distinct sides.) The coin may
land and stay on the edge, but this event is so enormously unlikely as to be considered impossible and be
disregarded. So the coin lands on either one or the other of its two sides. One is usually called head, the
other tail. These are two possible outcomes of a toss of a coin. In the case of a single toss, the sample space
has two elements that interchangeably, may be denoted as, say, {Head, Tail}, or {H, T}, or {0, 1}, ...
Rolling a die. The experiment is rolling a die. A common die is a small cube whose faces shows numbers 1,
2, 3, 4, 5, 6 one way or another. There are six possible outcomes and the sample space consists of six elements
:{ 1, 2, 3, 4, 5, 6}.
1 1* n
= i 1
1
n n
In this case, we define the probability P (A) of the event A occurring to be:
n( A)
P( A)
n( S )
Example: A fair coin tossed three times. Let A be event that only one tail appears. Find P (A).
Solution: S = {HHH, HHT, HTT, TTH, THH, THT, HTH, TTT}
1 1 1 3
P(A) = P(HHT) + P(THH) + P(HTH) =
8 8 8 8
6
Example2: Four equally qualified applicants (a, b, c, d) are up for two positions. Applicant a is a minority.
Positions are chosen at random. What is the probability that the minority is hired? Here, the sample space is
S = {ab, ac, ad, bc, bd, cd}.
We are assuming that the order of the positions is not important. If the positions are assigned at random, each
1
of the six sample points is equally likely and has probability. Let E denote the event that a minority is
6
number of outcomes in E 3
hired. Then, E = {ab, ac, ad} and P( E )
6 6
1.6 Counting techniques
In order to calculate probabilities, we have to know
In order to determine the number of outcomes, one can use several rules of counting
- Addition rule.
- The multiplication rule
- Permutation rule
- Combination rule
1.6.1 The Multiplication Rule
If a choice consists of k stages of which the first can be made in n1 ways, the second can be made in n2 ways…
the kth can be made in nk ways, then the whole choice can be made in (n1 * n2 * ........ * nk ) ways.
Example 1: An experiment consists of rolling two dice. Envision stage 1 as rolling the first and stage 2 as
rolling the second. Here, n1 = 6 and n2 = 6. By the multiplication rule, there are n1* n2 = 6*6 = 36 different
outcomes.
Example 2: An airline has 6 flights from A to B, and 7 flights from B to C per day. If the flights are to be
made on separate days, in how many different ways can the airline offer from A to C?
Solution: In operation 1 there are 6 flights from A to B, 7 flights are available to make flight from B to C.
Altogether there are 6*7 = 42 possible flights from A to C.
7
1.6.2 The addition rule
Suppose that the 1st procedure designed by 1 can be performed in n1 ways. Assume that 2nd procedure
designed by 2 can be performed in n2 ways. suppose further more that, it is not possible that both procedures 1
and 2 are performed together then the number of ways in which we can perform 1or 2 procedure is n 1+n2
ways, and also if we have another procedure that is designed by k with possible way of n k we can conclude
that there is n1+n2+…+nk possible ways.
Example: suppose we planning a trip, are deciding by bus, and train transportation. If there are 3 bus routes
and 2 train routes to go from A to B. find the available routes for the trip.
Solution:
There are 3+2 =5 routes for someone to go from A to B.
1.6.3 Permutation
A permutation is an arrangement of distinct objects in a particular order. Order is important
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
n! n!
nPn = =n!. In definition 0! = 1! = 1
n n! 0!
2. The arrangement of n objects in a specified order using r objects at a time is called the permutation of n
n!
n Pr
(n r )!
3. The number of permutations of n objects in which k1 are similar objects, k2 are similar objects ---- etc is
n!
n Pr
k1!*k 2 * ... * k n
4. Circular Permutation
The number of ways to arrange n distinct objects along a fixed (i.e., cannot be picked up out of the plane
and turned over) circle is pn (n 1)!
8
The number is (n-1)! instead of the usual factorial n! since all cyclic permutations of objects are
equivalent because the circle can be rotated.
For example, of the permutations of three objects, the distinct circular permutations
are and . Similarly, of the permutations of four objects, the distinct circular
permutations are , , , , .
Example 1: My bookshelf has 10 books on it. How many ways can I permute the 10 books on the shelf?
Answer: 10! = 3, 628,800.
Here n 4, r 2
4! 24
There are 4 P2 12 permutations.
(4 2)! 2
Example 3: In how many different ways can 3 red, 4 yellow and 2 blue bulbs can be arranged in a string
of Christmas tree light with 9 sockets?
3
9! 9*8*7*6*5*4!
n
i 1
i 3 4 2 9
3!4!2!
3*2*4!*2
1260 ways
9
1.6.4 Combination
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two letters.
Solutions:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Combination Rule
n and is
The number of combinations of r objects selected from n objects is denoted by
n Cr or
r
given by the formula:
n n!
r (n r )!*r!
Example 1: In how many ways a committee of 5 people be chosen out of 9 people?
Solutions:
n9 , r 5
n n! 9!
r
( n r )!*r! 4!*5! 126 ways
1. Among 15 clocks there are two defectives .In how many ways can an inspector chose three of the clocks
for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions:
10
a) If there is no restriction select three clocks from 15 clocks and this can be done in :
n 15 , r 3
n n! 15!
455 ways
(n r )!*r! 12!*3!
r
2 13
* 286 ways.
0 3
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non-defective, which can be done in:
2 13
1 *
156 ways.
2
2 13
* 13 ways.
2 1
Note: the number of ways of choosing r things out of n without replacement is given by n and the
r
r
number of ways of choosing r things out of n with replacement is given by n .
The probability of an event is a measure (number) of the chance with which we can expect the event to occur.
We assign a number between 0 and 1 inclusive to the probability of an event.
A probability of 1 means that we are 100% sure of the occurrence of an event, and a probability of 0 means
that we are 100% sure of the nonoccurrence of the event. P (A) denotes the probability of any event A in the
sample space S.
11
CLASSICAL DEFINITION OF PROBABILITY: If there are n equally likely possibilities, of which one
must occur, and m of these are regarded as favorable to an event, or as “success,” then the probability of the
m
event or a “success” is given by .
n
In any random experiment there is always uncertainty as to whether a particular event will or will not occur. As
a measure of the chance, or probability, with which we can expect the event to occur, it is convenient to assign
a number between 0 and 1. If we are sure or certain that an event will occur, we say that its probability is 100%
or 1. If we are sure that the event will not occur, we say that its probability is zero. If, for example, the
probability is 1 , we would say that there is a 25% chance it will occur and a 75% chance that it
4
will not occur. Equivalently, we can say that the odds against occurrence are 75% to 25%, or 3 to 1.
1. CLASSICAL APPROACH: If an event can occur in h different ways out of a total of n possible ways,
all of which are equally likely, then the probability of the event is h/n.
Examples:
1. A fair die is tossed once. What is the
probability of getting
a) Number 4?
b) An odd number?
c) An even number?
d) Number 8?
Solutions:
First identify the sample space, say S
S 1, 2, 3, 4, 5, 6
N n( S ) 6
12
a) Let A be the event of number 4
A 4
N A n( A) 1
n( A)
P ( A) 1 6
n( S )
NA
P( A) lim
N N
Example: If records show that 60 out of 100,000 bulbs produced are defective. What is the
probability of a newly produced bulb to be defective?
13
Solution:
Let A be the event that the newly produced bulb is defective.
NA 60
P( A) lim 0.0006
N N 100,000
3. The Axioms of Probability: Suppose we have a sample space S. If S is discrete, all subsets
correspond to events and conversely; if S is non-discrete, only special subsets (called measurable)
correspond to events. To each event A in the class C of events, we associate a real number P (A).
The P is called a probability function, and P (A) the probability of the event, if the following
axioms are satisfied.
1. P ( A) 0
2. P ( S ) 1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P ( A B ) P ( A) P ( B )
c
4. P ( A ) 1 P ( A)
5. 0 P ( A) 1
6. P (ø) =0, ø is the impossible event.
7. If A1, A2, A3,……………,An , are pair wise mutually exclusive events ,then
n n
p Ai P(A1 ) p(A 2 ) p(A n ) p( Ai )
i 1 i 1
AUB AnB A
In general P ( A B ) p ( A) p ( B ) p ( A B )
14
1.8 Derived theorems of probability
Proof S∪ = S and S∩ =
⟹P (S∪ ) = P(S) +P ( )
⟹P(S) = P(S) + P ( )
⟹ P(S) - P(S) = P ( )
⟹ P ( ) =0
Theorem 2: P( Ac ) =1- P( A)
⟹ A∪ A = S ⟹P (A∪ A ) = P (A) + P( A )
c c c
⟹P(S) = P (A) + P( A ) =1
c
⟹ P (A) + P( A ) =1
c
⟹ P( A ) = 1- P (A)
c
c
Theorem 3: For any two events A and B , P ( A B ) P ( A) P ( A B )
c
A can be decomposed into the mutually exclusive events A B and A B :
c c c
A ( A B ) ( A B ) P ( A B ) P ( A B ) P ( A B ) P ( A) P ( A B )
c
P ( A B ) P ( A B ) P ( B ) P ( A) P ( A B ) P ( B )
P ( A) P ( B ) P ( A B )
15
Theorem 5: For any three events A, B and C, P (A∪B∪C) = P (A) +P (B) +P(C)-P(A∩B)-
P(A∩C)-P(B∩C)+P(A∩B∩C)
P(A∪B)+P(C)-P{(A∩ C) ∪ B∩ C}
An obvious extension to the above theorem suggests. Let A1 , A2 ,........., AK be any k events. Then
K K K k 1
P ( A1 A2 ......... A K ) P (Ai ) P (Ai A j ) P (Ai A j Ar ) ( 1) P ( A1 A2 ..... A K )
i 1 i j2 i jr3
P(B)-P(A) = P ( A ∩B) ≥ 0
c
If P (A∩B) = 0, P (A∪B) = P (A) +P (B). If P (A∩B) ≥0, P (A∪B) < P (A) +P (B).
i 1 i 1
Proof
16
Example 1
Three items are selected at random from a manufacturing process. Each item is inspected and
classified defective (D) or non-defective (N).
Example 2
The event that the number of defectives in above example is greater than 1.
Example 3
Suppose a licence plate containing two letters following by three digits with the first digit not
zero. How many different licence plates can be printed?
(26)(26)(9)(10)(10) = 608,400
Example 4
How many 7-letter words can be formed using the letters of the word 'BENZENE'?
7!
The number of 7-letter words that can be formed is 420
(1!)(3!)(2!)(1!)
17
Example 5
A box contains 8 eggs, 3 of which are rotten. Three eggs are picked at random. Find the
probabilities of the following events.
Solution:
(a) The 8 eggs can be divided into 2 groups, namely, 3 rotten eggs as the first group and 5
good eggs as the second group.
Getting 2 rotten eggs in 3 randomly selected eggs can occurred if we select randomly 2
eggs from the first group and 1 egg from the second group.
Thus the probability of having exactly two rotten among the 3 randomly selected eggs is
3 C 2 5 C1 15
8 C3 56
3 C3 5 C0 1
8 C3 56
3 C0 5 C3 10 5
8 C3 56 28
Example 6: what is the probability that 3 men and 4 women be selected from a group of 7 men and 10
women?
18
Example 7
180 students took examinations in English and Mathematics. Their results were as follows:
80 4
Probability that a randomly selected student passed English =
180 9
120 2
Probability that a randomly selected student passed Mathematics
180 3
144 4
Probability that a randomly selected student passed at least one subject
180 5
Find the probability that a randomly selected student passed both subject.
Solution
Let E be the event of passing English, and M be the event of passing Mathematics.
4 2 4
It is given that: P( E ) ; P( M ) ; P( M E )
9 3 5
As P(M E) P( E) P(M ) P(M E)
4 2 4 14
P( M E ) P( E ) P( M ) P( M E ) 0.31
9 3 5 45
Example 8
A card is drawn from a complete deck of playing cards. What is the probability that the card is a
heart or an ace?
Solution
Let A be the event of getting a heart, and B be the event of getting an ace.
19
P( A B) P( A) P( B) P( A B)
13 4 1 16 4
52 52 52 52 13
What is the probability of getting a total of '7' or '11' when pair of dice are tossed?
Solution
Possible outcomes of getting a total of '7' :{1,6; 2,5; 3,4; 4,3; 5,2; 6,1}
Let A be the event of getting a total of '7', and B be the event of getting a total of '11'.
6 2 2
36 36 9
20
CHAPTER TWO
2. Conditional Probability and Independence
2.1 Conditional Probability
Conditional Events: If the occurrence of one event has an effect on the next occurrence of the other
event then the two events are conditional or dependant events.
Let A and B be any two events associated with a random experiment. The probability of occurrence of
event A when the event B has already occurred is called the conditional probability of A when B is
given and is denoted as P (A/B).
P A B
It is defined as P A B
P B
, where P(B)>0
Similarly the conditional probability of even B when event A already occurred is given by
P A B
P B A
P A
, where P (A)>0. This implies that
p( A B) P( A / B)* P( B)
=P( B / A)* P( A)
(2) p( B ' A) 1 p( B A)
(1) 0 P( B / A) 1
(2) P(S / A) 1
(3) P( B1 B2 / A) P( B1 / A) P( B2 / A), if B1 B2
(b) If A= S, P( B / S ) P( B S ) / P(S ) P( B)
21
Remark: we can compute conditional probability using two ways
1. Directly considering the probability of an event with respect to reduced sample space.
2. Using the formula of conditional probability.
Example 1: The probability a policy will correctly formulated is 0.6 and the probability that it
will be correctly formulated and executed is 0.54. Find the probability
P (C C ) 0.54
a. P (C / C ) E F
0.9
E F P (C ) 0.6
F
b. P( NC / C ) 1 P(C / C ) 1 0.9 0.1
E F E F
(b) What is the probability that both are girls, if the eldest is a girl?
1
Clearly, A1 A2 = {(F, F)} and p( A2 A1 ) assuming that the four outcomes in S are equally
4
likely.
1
P( A1 A2 A1 ) P( A1 A2 ) 1
probability, we get P( A2 A1 / A1 ) 4
P( A1 ) P( A1 ) 2 2
4
22
Example 3: In a certain community, 36 percent of the families own a dog, 22 percent of the families
that own a dog also own a cat, and 30 percent of the families own a cat.
(a) Compute the probability that the family owns both a cat and dog.
(b) Compute the probability that the family owns a dog, given that it owns a cat.
Solution: Let C = {family owns a cat} and D = {family owns a dog}. From the problem, we are
given that P (D) = 0.36, P(C / D) 0.22 and P (C) = 0.30.
P(C D) P(C D)
(a) P(C / D) 0.22 0.22 P(C D) 0.22*0.36 0.0792
P( D) 0.36
P(C D) 0.0792
(b) P( D / C ) 0.264
P(C ) 0.30
Example 4: suppose that an office has 100 calculating machines. Some of these machines are electric
(E) while others are manual (M). And some of the machines are new (n) while others are used (U).the
table below gives the number of machines in each category. A person enters the office, picks a machine
at random, and discovers that it is new. What is the probability that it is electric?
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
Solution:
40
Simply considering the reduced sample space N(e.g., the 70 new machines), we have P( E / N )
70
40
P( E N ) 4
Using the definition of conditional probability we have that P( E / N ) 100
P( N ) 70 7
100
23
Example 5: Assume 100 students were asked, "Do you smoke?" Responses are shown below in the
contingency table which gives the cross tabulations.
. Yes No Total
Male 19 41 60
Female 12 28 40
Total 31 69 100
What is the probability of a randomly selected individual being a male and who smokes? This is
just a joint probability. The number of "Male and Smoke" divided by the total = 19/100 = 0.19
What is the probability of a randomly selected individual being a male? This is the total for male
divided by the total = 60/100 = 0.60. Since no mention is made of smoking or not smoking, it includes
all males.
What is the probability of a randomly selected individual smoking? Again, since no mention is
made of gender, this is a marginal probability, the total who smoke divided by the total = 31/100 = 0.31.
What is the probability of a randomly selected male smoking? This time, you are told that you have
a male, which is a conditional probability. You are being asked: given that you select a male, what is
the probability of a smoker? Thus, 19 males smoke out of 60 males, so 19/60 = 0.31666.
What is the probability that a randomly selected smoker is male? This time, you are told that you
have a smoker and asked to find the probability that the smoker is also male. There are 19 male smokers
out of 31 total smokers, so 19/31 = 0.6129.
2.2 Multiplication theorems of probability
P A B P A B
P A B , P B A P A B P A B* P B
P B P A
= P B A * P A
Example: a lot consists 20 defective and 80 non-defective items. If we choose two items at
random without replacement, what is the probability that both items are defective?
P A B =?
24
20 1 19 1 19 19
P A B = P B A * P A , P( A) , P( B / A) P( A B) P( A)* P( B / A) *
100 5 99 5 99 495
Let A1 , A2 ,.............., An be a sequence of events ,then
n
P Ai P( A1 )* P( A 2 / A1 )* P( A2 )* P( A 2 / A1 , A2 )*.......* P( An / A1 ,....., An 1 )
i 1
Example 1: I am dealt a hand of 5 cards. What is the probability that they are all spades?
13
P ( A1 )
52
12
P ( A2 / A1 )
51
11
P ( A3 / A2 , A1 )
50
10
P ( A4 / A3 , A2 , A1 )
49
9
P ( A5 / A4 , A3 , A2 , A1 )
48
P( A1 A2 A3 A4 A5 ) P( A1 ) * P( A2 / A1 ) * P( A3 / A2 , A1 ) * P( A4 / A3 , A2 , A1 ) * P( A5 / A4 , A3 , A2 , A1 )
13 12 11 10 9
* * * * 0.0005
52 51 50 49 48
Note: As another way to solve this problem, a student recently pointed out that we could simply
13
regard the cards as belonging to two groups: spades and non-spades. There are ways to draw
5
52
5 spades from 13. There are possible hands. Thus, the probability of drawing 5 spades
5
13
(assuming that each hand is equally likely) is 5 0.0005
52
5
Example 2: The probability that a married man watches a certain TV show is 0.4 and the probability
that a married woman watches the show is 0.5. The probability that a man watches the show given that
his wife does is 0.7. Find the probability that
25
(iii) At least one person of a married couple watches the show
Answer Let H = {husband watches show}
W = {wife watches show}
0.5 0.7
0.35
(ii) P( W | H) P(H W)
P( H)
0.35
=
0 .4
= 0.875
Example 3: In a certain college, 25% of the students failed in mathematics, 15% of the students failed in
chemistry, and 10% of the students failed both in mathematics and chemistry. A student is selected at
random.
Let M = {students who failed in mathematics} and C = {students who failed in chemistry}; then
(i) The probability that a student failed mathematics, given that he has failed in chemistry is
P( M C ) 0.10 2
P( M / C )
P(C ) 0.15 3
(ii) The probability that a student failed in chemistry, given that he has failed in mathematics is
P(C M ) 0.10 2
P(C / M )
P( M ) 0.25 5
26
2.3 Law of Total Probability and Bayes’ Rule
A sequence of sets B1, B2 , ……………….,Bk is said to form a partition of the sample space S if
A B1 A B2 A ............ Bn A P( A) P( B1 A) P( B2 A) ............ P( Bn A)
n
P( A) P( A / B1 )* P( B1 ) P( A / B2 )* P( B2 ) ......... P( A / Bn )* P( Bn ) P( A / Bi )* P( Bi )
i 1
Example 1: Suppose that a manufacturer buys approximately 60 percent of a raw material (in boxes)
from Supplier 1, 30 percent from Supplier 2, and 10 percent from Supplier 3. For each supplier,
defective rates are as follows: Supplier 1: 0.01, Supplier 2: 0.02, and Supplier 3: 0.03. The manufacturer
observes a defective box of raw material. What is the probability of defective?
n
Solution: Let A = {observe defective box}, then P( A) P( A / B ).P( B )
i 1
i i
Let B1, B2, and B3, respectively, denote the events that the box comes from Supplier 1, 2, and 3. The
prior probabilities (ignoring the status of the box) are
P (B1) = 0.6
P (B2) = 0.3
P (B3) = 0.1
27
3
P( A) P( A / Bi ).P( Bi ) P( A / B1 ) P ( B1 ) P( A / B2 ) P ( B2 ) P( A / B3 ) P( B3 )
i 1
Example 2: A certain item is manufactured by three factories say 1, 2 and 3. It is known that 1 turn
out twice as many as items as 2 and that 2 and 3 turn out the same number of items (during a specified
production period). It is also known that 2 percent of the items produced by 1 and 2 while 4 percent of
those manufactured by 3 are defective. All the items produced are put into one stockpile and then one
item is chosen at random. What is the probability that this item is defective?
P( A) P( A / B1 ) P ( B1 ) P( A / B2 ) P ( B2 ) P( A / B3 ) P( B3 )
1 1
P( B1 ) , P( B2 ) P( B3 ) , P( A / B1 ) P( A / B2 ) 0.02, P( A / B3 ) 0.04
2 4
1 1 1
P( A) P( A / B1 ) P ( B1 ) P( A / B2 ) P ( B2 ) P( A / B3 ) P( B3 ) 0.02* 0.02* 0.04* 0.025
2 4 4
P( A / B ).P( B )
i 1
i i
REMARK: Bayesians call P (Bj) the prior probability for the event Bj; they call P (Bj/A) the posterior
probability of Bj, given the information in A.
Example: : Suppose that a manufacturer buys approximately 60 percent of a raw material (in boxes)
from Supplier 1, 30 percent from Supplier 2, and 10 percent from Supplier 3. For each supplier,
defective rates are as follows: Supplier 1: 0.01, Supplier 2: 0.02, and Supplier 3: 0.03. The manufacturer
observes a defective box of raw material.
28
(b) What is the probability that it came from Supplier 2?
(c) What is the probability that the defective did not come from Supplier 3?
Solution:
(a) Let B1, B2, and B3, respectively, denote the events that the box comes from Supplier 1, 2, and 3.
the prior probabilities (ignoring the status of the box) are
P (B1) = 0.6
P (B2) = 0.3
P (B3) = 0.1
Note that {B1, B2, B3} partitions the space of possible suppliers. Thus, by Bayes Rule,
P ( A / B2 ) P ( B2 ) P( A / B2 ) P ( B2 )
P( B2 / A)
P( A) P( A / B1 ) P ( B1 ) P( A / B2 ) P ( B2 ) P( A / B3 ) P( B3 )
(0.02)(0.3)
0.40
(0.01)(0.6) + (0.02)(0.3) + (0.03)(0.1)
This is the updated (posterior) probability that the box came from Supplier 2 (updated to include
the information that the box was defective).
P ( A / B3 ) P ( B3 ) P( A / B3 ) P ( B3 )
P( B3 / A)
P( A) P( A / B1 ) P ( B1 ) P( A / B2 ) P ( B2 ) P( A / B3 ) P( B3 )
(0.03)(0.1)
0.20
(0.01)(0.6) + (0.02)(0.3) + (0.03)(0.1)
__
Thus, P( B3 / A) 1 P( B3 / A) 1 0.20 0.80 , by the complement rule.
2.4 Independence
When the occurrence or non-occurrence of A has no effect on whether or not B occurs, and vice versa,
we say that the events A and B are independent. Mathematically, we define A and B to be independent if
and only if P( A B) P( A)* P( B) , otherwise, A and B are called dependent events. Note that if A and
B are independent,
29
P ( A B) P( A) P ( B)
P( A / B) P( A)
P( B) P( B)
P ( A B) P( A) P ( B)
P( B / A) P( B)
P( A) P( A)
Example: A red die and a white die are rolled. Let A = {4 on red die} and B = {sum is odd}. Of the 36
outcomes in S, 6 are favorable to A, 18 are favorable to B, and 3 are favorable to AB. Assuming the
outcomes are equally likely,
red die 1 2 3 4 5 6
white die
3 6 18 3 3
P( A B) , P( A)* P( B) * P( A B) P( A)* P( B) , the events A and B are
36 36 36 36 36
independent.
Many experiments consist of a sequence of n trials that are viewed as independent (e.g., flipping a coin
10 times). If Ai denotes the event associated with the ith trial, and the trials are independent, then
n n
P( Ai ) P( Ai )
i 1 i 1
Example: An unbiased die is rolled six times. Let Ai = {i appears on roll i}, for i = 1, 2,……….,6. Then,
6
1 6
1
P( Ai ) , and assuming independence, P( A1 A2 A3 A4 A5 A6 ) P( Ai )
6 i 1 6
Suppose that if Ai occurs, we will call it "a match." What is the probability of at least one match in the
__
six rolls? Solution: Let B denote the event that there is at least one match. Then, B denotes the event
6
__ __ __ __ __ __ __ 6 __
5
that there are no matches. Now, P( B )= P( A1 A2 A A A A) P( Ai ) 0.335
i 1 6
__
P( B)= 1 P( B ) 1 0.335= 0.665, by the complement rule.
30
Note: pair wise independent does not necessary imply the independence of several events i.e.
P( A B) P( A)* P( B), P( A C ) P( A)* P(C ), P( B C ) P( B)* P(C ) but P( A B C ) P( A)* P( B)* P(C)
Definition: we say that the three events A, B and C are mutually independent if and only if all the
following conditions hold:
P (AiAj………An-1) = P (Ai).P (Aj), ∀i≠j≠,………,n-1.This implies that there are 2n-n-1 conditions
should be satisfied.
Example: Suppose that we toss two dice. Define the events A, B, C as follows:
A = {the first die shows an even number}
B = {the second die shows an odd number}
C = {the two dice show both odd or both even numbers}
die 1 1 2 3 4 5 6
die 2
We have P (A) = P (B) = P(C) = 0.5. Furthermore, P (AB) = P (AC) = P (BC) = 0.25. Hence, the
three events are all pair wise independency. However, P (ABC) = 0 ≠ P(A).P(B).P(C)
31
1
Example: The probability that a man will live 10 more years is , and the probability that his wife ill
4
1
live 10 more years is . Find the probability that (i) both will be alive in 10 years, (ii) at least one will
3
be alive in 10 years; (iii) neither will be alive in 10 years, (iv) only the wife will be alive in 10 years.
Let A = event that the man is alive in 10 years, and B = event that his wife is alive in 10 years; then
1 1
P( A) , P( B)
4 3
1 1 1
i. We seek P(AB). Since A and B are independent, P(AB) = P(A)*P (B) = *
4 3 12
1 1 1 1
ii. We seek P(AB). P(AB) = P(A) +P(B) - P(AB) =
4 3 12 2
1 3 1 2
iii. We seek P(AcBc). Now P(Ac) = 1 - P(A) = 1 and P(Bc) =1-P(B) = 1 . Furthermore,
4 4 3 3
3 2 6 1
since Ac and Bc are independent P(AcBc) = P(Ac) P(Bc) = * .
4 3 12 2
1 1
Alternately, since (AB) c = AcBc, P(AcBc) = P((AB) c) = 1 - P(AB) = 1
2 2
1 3
iv. We seek P(AcB). Since P(Ac) = 1- P(A) = 1 and Ac and B are independent,
4 4
3 1 1
P(AcB) = P(Ac)*P(B) = *
4 3 4
32
CHAPTER THREE
Definition 1: Let be an experiment and S a sample space associated with the experiment. A function
X assigning to every to every element s∊S a real number X(s) is called random variable.
Example: suppose that we toss two coins and consider the sample space associated with this
experiment. That is S = {HH, HT, TH, TT}
Let X is the number of heads obtained in the two tosses. Hence X (HH) = 2, X (HT) = X (TH) =1 and X
(TT) = 0
s X(s)
Let e be an experiment and S be its sample space. Let X be a random variable defined on S and let Rx be
its range space, let B be an event with respect to Rx i.e. B⊆ Rx. Suppose that A is defined as A= { s∊S/
X(s)∊B},then A and B are called equivalent events. This implies that P (A) = P (B). Random variable
can be discrete or continuous.
Example: Consider the tossing of two coins. Hence S = {HH, HT, TH, TT}. Let X is the number of
heads obtained in the two tosses. Hence Rx = {0, 1, 2}. Let B = {1}. Since X (HT) = X (TH) = 1, we
1 1 1
have that A ={HT, TH} is equivalent to B. We have P (HT)=P (TH) = 1 .Hence P (HT,TH) = .
4 4 4 2
1
Since the event {X = 1} is equivalent to the event {HT,TH}, we have that P(X=1) =P (HT, TH) , this
2
implies that A and B are equivalent events.
33
Definition: Let X be a discrete random variable. Hence Rx, the range space of X, consists of at most a
countable infinite number of values x1, x2,…….., with each possible outcome xi we associate a number
P(xi) = P(X = xi) , called the probability of xi. The numbers P (xi), i=1, 2, 3,………., must satisfy the
following conditions.
a. P xi 0 , for all i
b. Px 1
i 1
i
The function p defined above is called the probability function (point probability function) of the
random variable X. The collection of pairs (xi, P(xi)),i = 1, 2, 3,……….., is sometimes called the
probability distribution of X.
Example: suppose that we toss two coins and consider the sample space associated with this
experiment. That is S = {HH, HT, TH, TT}
Let X is the number of heads obtained in the two tosses. Hence X (HH) = 2, X (HT) = X (TH) =1 and X
2
(TT) = 0, Rx = {0, 1, 2}, P(X = 0) = 1 , P( X 1) 1 , P( X 2) 1 , P( X i ) P( X 0) P( X 1) P( X 2) 1
4 2 4 i 1
X 0 1 2
P(X = xi) 1 1 1
4 2 4
Example : Suppose that a radio tube is inserted into socket and tested. Assume that the probability that
3 1
it tests positive equals ; hence the probability that it tests negative is . Assume further that we are
4 4
testing a large supply of such tubes. The testing continues until the first positive tube appears. Define the
random variable X as follows: X is the number of tests required to terminate the experiment. The sample
space associated with this experiment is
To check that these values of P(n) satisfy the condition we note that
3 1 1 3 1
n 1
P ( n) 1
4 4 16
........ 1
4 1 1
4
Example: Toss a coin once. S = {H, T}, let X = number of head, then X follows a Bernoulli distribution
( X ~ B(1,0.5)) , if and only if the possible values of X are 0 and 1 with P(X=1) = p and P(X= 0) =1-p = q
n
⇒ p x q1 x , x 0,1
x
Consider n independent Bernoulli random variables, all with the same success probability p . The
sum X X 1 Xn is called a binomial random variable with parameters n and p , then
X ~ Bin(n, p))
A discrete random variable x is said to have a binomial distribution if x satisfies the following
conditions:
Examples:
Consider the experiment of flipping a coin 5 times. If we let the event of getting tails on a flip be
considered “success”, and if the random variable T represents the number of tails obtained, then
1 1
T will be binomially distributed with n 5 , p , and q .
2 2
35
A student takes a 10 question multiple-choice quiz and guesses each answer. For each question,
there are 4 possible answers, only one of which is correct. If we consider “success” to be getting
a question right and consider the 10 questions as 10 independent trials, then the random variable
1
X representing the number of correct answers will be binomially distributed with n 10 , p ,
4
and q 3 .
4
Fourteen percent of flights from Tampa International Airport are delayed. If 20 flights are chosen
at random, then we can consider each flight to be an independent trial. If we define a successful
trial to be that a flight takes off on time, then the random variable z representing the number of
on-time flights will be binomially distributed with n 20 , p .86 , and q .14 .
Suppose that items coming off a production line are classified as defective (D) or non-defective
(N). Suppose that three items are chosen at random from a day’s production and are classified
according to this scheme. The sample space for this experiment ,say S may be described as:
S= {DDD, DDN, DND, NDD, NND, NDN, DNN, NNN}
3.1.1.1 Calculating Probabilities for a Binomial Random Variable
If X is a binomial random variable with n trials, probability of success p, and probability of failure q,
then by the Fundamental Counting Principle, the probability of any outcome in which there are x
successes (and therefore n x failures) is:
( p p ... p) (q q ... q) p x q n x
x successes n x failures
To count the number of outcomes with x successes and n x failures, we observe that the x successes
could occur on any x of the n trials. The number of ways of choosing x trials out of n is n Cx , so the
probability of x successes becomes:
P( x ) n Cx p x q n x
Example 1: If a coin is flipped 10 times, what is the probability that it will fall heads 3 times?
Solution: Let S denote the probability of obtaining a head, and F the probability of obtaining a
tail.
Clearly, n = 10, k = 3, p = 1/2, and q = 1/2.
36
Example 2: If a basketball player makes 3 out of every 4 free throws, what is the probability that he will
make 6 out of 10 free throws in a game?
Solution: The probability of making a free throw is 3/4. Therefore, p = 3/4, q = 1/4, n = 10,
and k = 6.
Therefore,
Example 3: If a medicine cures 80% of the people who take it, what is the probability that of the eight
people who take the medicine, 5 will be cured?
Example 4: If a microchip manufacturer claims that only 4% of his chips are defective, what is the
probability that among the 60 chips chosen, exactly three are defective?
Solution: If S denotes the probability that the chip is defective, and F the probability that the
chip is not defective, then p = .04, q = .96, n = 60, and k = 3.
Bin (60, 3; .04) = 60C3 (.04)3(.96)57 = .2138
Example 5: If a telemarketing executive has determined that 15% of the people contacted will purchase
the product, what is the probability that among the 12 people who are contacted, 2
will buy the product?
Solution: If S denoted the probability that a person will buy the product, and F the probability
that the person will not buy the product, then p = .15, q = .85, n = 12, and k = 2.
Bin (12, 2, .15) = 12C2 (.15)2(.85)10 = .2924.
Example 6: A new medication gives 5% of users’ on undesirable reaction. If a sample of 13 users
receive the medication, find the probability of (assume independent from user to user)
37
Solution: n 13, p 0.05, q 0.95
c. P( X 1) 1 P( X 0) 1 0.5133421 0.4866579
Example 7: Suppose that a radio tube inserted into a certain type of set has a probability of 0.2 of
functioning more than 500 hours. If we test 20 tubes, what is the probability that exactly k of these
function more than 500 hours, k = 1, 2, …………….., 20?
If X is the number of tubes functioning more than 500 hours, we shall assume that X has a binomial
20
distribution. Thus P ( X k ) (0.2) k (0.8) 20 k
k
a. f ( x) 0, for all i
b.
f ( x)dx 1
b
c. For any a, b with - a b , we have P(a X b) f ( x)dx
a
a
Note: 1. P( X x) f ( x)dx 0
a
2. For continuous random variable X, if X may assume all values in some interval c, d , with the
associated probability density function f ( x) , the following probabilities are all the same
38
d
P(c X d ) P(c X d ) P(c X d ) f ( x)dx
c
2 x, 0 x 1
Example 1: Let X be a continuous with pdf f ( x)
0, otherwise
a. Verify that f ( x) is proper pdf
b. Find the 1 3
P X
2 4
c. Evaluate 1 1 2
P X | X
2 3 3
Solution:
a. i. P( X ) 0, x (0,1)
1
1 1
x2
ii. f ( x)dx 2 x 2
2
1 0 1
0
0 0
3
3 3 4
x2 9 1 94 5
b. P X
4 4
1 3
2 4 f ( x)dx 2 x 2
2 16 4 16
16
1
2
1
2
1
2
1
2
2 xdx
2 P 3 X 2
1 1 5
c. 1 1 1
36 5
P X | X 3
2 3 3 P 1 X 2 2
3 1 12
2 xdx
3 3 3
1
3
Example 2: let X be the continuous random variable with the following pdf
ax, 0 x 1
fX x a,1 x 2
ax 3a, 2 x 3
0, otherwise
b. Evaluate P( X 3 )
2
39
Solution:
1 2 3
f ( x)dx axdx adx (ax 3a )dx 1
0 1 2
a. 1 3
ax 2 ax 2 3ax 2 1
ax 1
2
1, a
2 0 2 2 2 2
3 2 3
b. P( X 3 ) f ( x)dx 1 dx ( 1 x 3 )dx 1
2
3
3 2
2 2 2 2
2 2
Example 3: Let X be the life length of a certain type of light bulbs (in hours). Assuming X to be
a continuous random variable .we suppose that the pdf of X is given by
a
, 1500 x 2500
f ( x) x 3
0, elsewhere
That is, we are assigning probability zero to the events ( {X 1500} and {X>2500} ). To evaluate
2500 a
dx 1. From this we obtain a 7 , 31 and 250
1500 x 3
x, 0 x 1
f X x 2 x,1 x 2 Evaluate P 1 X 3 | 1 X 2
2 2 4
0, otherwise
Theorem: (a) If X is discrete random variable, F ( x) P( x j ) where the sum is taken over all
j
indices j satisfying x j x
x
(b)If X is a continuous random variable with pdf f , F ( x) f ( x)dx
40
Example 1: suppose the random variable X assume the three values 0, 1, and 2 with probabilities
1 1 1 respectively .then
, , and
3 6 2
0, if x 0
1
, if 0 x 1
3
F ( x)
1 , if 1 x 2
2
1, if x 1
2 x, if 0 x 1
f ( x)
0, elsewhere
0 , if 0
x
Hence the cdf F is given by F ( x) 2 xdx x 2 ,if 0 x 1 (draw the graph for this function)
0
1,if 0 1
Note: (a) If X is a discrete random variable with finite number of possible values, the graph of
cdf F will be made up of horizontal line segments (it is a set function). The function F is
continuous except at possible values of X, namely x1, x2, …………..,xn. At the values xj the
graph will have a “jump” of magnitude P( x j ) P( X x j ) .
(c) The cdf F is defined for all values of x, which is an important reason for considering it.
Theorem: (a) Let F be the cdf of a continuous random variable with pdf f , then
d
f ( x) F ( x ), x at which F is differentiable.
dx
41
(b)Let X be a discrete random variable with possible values x1 , x 2 , ………….. and
suppose that it is possible to label these values so that x1 x 2 ………….. .Let F be the
cdf of X, then P( x j ) P( X x j ) F ( x j ) F ( x j 1 )
0, x 0
F ( x) x
1 e , x 0
e x , x 0
f ( x)
0, elsewhere
2 2
P ( 3 x 2) F ( 2) F ( 3) (1 e ) 0 1 e
The random variable X may assume certain distinct values, say x , x , …………..x
1 2 n
,with
positive probability and also assume all values in some interval ,say a x b . The
probability distribution of such a random variable would be obtained by combining the ideas
considered above for the description of discrete and continuous random variables as follows. To
n
each value xi assign a number P( xi ) such that P( xi ) 0 all i and such that P ( x ) p 1.
i i
b
Then define a function f satisfying f ( x ) 0, f ( x ) dx 1 p. For all a and b with
a
b
a b , P ( a X b ) f ( x ) dx P ( x ) 1 P( X ) 1
[ i:a xb ] i
a
42
3.5 Uniformly distributed random variables
Definition: Suppose that X is a continuous random variable assuming all values in the
1
,a x b
f ( x) b a
0, elsewhere
Notes: (a) A uniformly distributed random variable has a pdf which is constant over the interval
P(c X d ) is the same for all subintervals having the same length That is,
P(c x d ) d c and thus depends only on the length of the interval and not on the
ba
location of that interval.
(c)We can now make precise the intuitive notion of choosing a point P at random on an interval
say [a, b] .By this we shall simply mean that the x-coordinate of the chosen point, say X, is
The cumulative distribution function for uniformly distributed random variable X over the
interval [a, b] will be computed as follows;
0, x a
x x 1 1 x x xa x a
x
F ( x) f ( x)dx dx 1dx b a F ( x) b a , a x b
a a b a b a a b a a
1, x b
43
Example 1: A point is chosen at random on the line segment [0, 2] (assume uniform
distribution). What is the probability that the chosen point lies between 1 and 3 ?
2
Let X represents the coordinate of the chosen point, we have that X ~ U (0,1)
1 3 3 3
,0 x 2 P(1 x 3 ) 2 21 1 2 3 1 3 2 1
f ( x) 2 f ( x)dx dx x
2 2 2 4 2 4 4
0, otherwise
1 1 1
1
Solution: i) ,50 h 70
f ( x ) 20
0, otherwise
0, x 50
1
F ( x) x,50 x 70
20
1, x 70
44
CHAPTER FOUR
4. FUNCTIONS OF RANDOM VARIABLES
4.1 Equivalent events
Suppose we know the distribution of X , we may have an interest on determining the distribution
of Y H ( x ) .
S RX RY
s X(s) H(x) =y
Let be an experiment and S sample space associated with . Let X be a random variable
defined on S. Suppose that y H ( x ) is a real valued function of x .Then Y H ( x ) is a random
variable since for every s S a value of Y is determined say y H [(x )] . RX is the range space
of X and RY is the range space of Y.
Definition: Let C be an event (subset) associated with the range space of Y, RY i.e. C R Y .
Note: (a) B and C are equivalent events if and only if B and C occur together. That is, when B occurs C
also occurs and conversely.
(b) Suppose that A is an event associated S which is equivalent to an event B associated with R X. Then,
if C is an event associated with RY which is equivalent to B, we have that A is equivalent to C.
2
Example : Suppose that H ( x ) x . Then the events B : {X 2} and C : {Y 4 } are equivalent .
2
For if Y X ,then { X 2} occurs if and only if {Y 4 } occurs, since X cannot assume
negative values in the present context.
Definition: Let X be a random variable defined on sample space S. Let RX be the range space of X. Let
H be a real valued function and consider the random variable Y H ( X ) with range space R . For any
Y
45
Example: Let X be a continuous random variable with pdf
x
f ( x) e x , x 0 ( A simple integratio n reveals that e dx 1 ).
0
Suppose that the event C defined as follows C {Y 5 }. Now y 5 if and only if 2 x 1 5 which in turn
yields x 2. Hence C is equivalent to B {X 2 }. Now P(X 2 ) e x dx
1 1
P (Y 5 )
2 e2 e2
Two events B and C are equivalent events.
y= 5 y= 2 x 1
x2
Example: Let X be a continuous random variable with probability density function ( pdf ) f
1, 0 x 1
f ( x)
0, otherwise
x 1 1
Let Y e , X g ( y ) ln y dx = dy
y
dx 1 dx 1 1
f ( y) . f (g ( y )) 1.
dy y dy y
1 ,1 y e
h( y ) y
0, otherwise
46
4.2 FUNCTIONS OF DISCRETE RANDOM VARIABLES
If X is a discrete random variable if Y H ( x ) ,then Y is also a discrete. X is discrete possible
values of X are x1 , x2 ,....., xn and the possible values of Y are y H ( x ), y H ( X ),.......... yn H ( xn ) .
1 1 2 2
Some Y is may be the same, i.e. H may not be one to one function.
x -2 0 2
P( x) 1 1 1
3 2 6
b ) Y 2x 1
Solution:
a) R {0, 4}
Y
1
P (Y 0 ) P ( H ( x ) 0 ) P ( X 0 )
2
1 1 1
P (Y 4 ) P ( H ( x ) 4 ) P ( X 2, or X 2 ) P ( X 2 ) P ( X 2 )
3 6 2
-2 0
Y
P (Y ) 1 1
2 2
1
b) RY {3,1,5}, P (Y 3) P ( H ( x ) 3) p ( X 2}
3
1
P (Y 1) P ( H ( x ) 1) p ( X 0}
2
1
P (Y 5 P ( H ( x ) 5) p ( X 2}
6
47
4.3 Functions of continuous random variables
The most important (and most frequently encountered) case arises when X is a continuous
random variable with pdf f and H is a continuous function. Hence Y H ( X ) is continuous
random variable and it will be our task to obtain its pdf say g .
a. Obtain G , the cdf of Y , where G ( y ) P(Y y ) ,by finding the event A (in the range
space of X) which is equivalent to {Y y} .
b. Differentiate G ( y ) with respect to y in order to obtain g ( y )
c. Determine those values of y in the range space of Y for which g ( y ) 0.
2 x , 0 x 1
f ( x)
0, elsewhere
x
Let H ( X ) e . To find the pdf of Y H ( X ) we proceed as follows
X 1
G ( y ) P (Y y ) P ( e y ) P ( X ln y ) 2
2 xdx 1 ( ln y )
ln y
Hence g ( y ) dG ( y ) 2 ln y .
dy y
1
Since f(x) 0 for 0 x 1 , we find that g(y) 0 for y 1.
e
2 ln y 1
, y 1
Therefore g ( y) y e
0, elsewhere
Theorem: Let X be a continuous random variable with pdf f where f (x) 0 for a x b .
Suppose that y H(x) is a strictly monotone (decreasing or increasing) function of x . Assume
that this function is differentiable (and hence continuous) for all x . Then the random variable Y
defined as Y H ( X ) has pdf g given by,
48
dx 1 dx
x is expressed in terms of y .
g ( y ) f ( x) f (H ( y )) Where
dy dy
1
G ( y ) P (Y y ) P ( H ( X ) y ) P ( X H ( y ))
Differenti ating G(y) with respect to y, we obtain, u sin g the chai n rule for derivatives,
1
G ( y ) P (Y y ) P ( H ( X ) y ) P ( X H ( y ))
1 1
1 - P( X H ( y )) 1 F ( H ( y ))
dG(y) dG(y) dx d dx dx
(1 F ( x )) f ( x)
dy dx dy dx dy dy
Thus, by using the absolute value sign and combining (a) and (b)
dx 1 dx
g ( y ) f ( x) f (H ( y ))
dy dy
,x 1
f ( x ) x 1
0, otherwise
Where is a positive parameter. This is an example of a Pareto distribution. We want to find the
density of Y = ln X .As the support of X, i.e. the range on which the density is non-zero, is
x > 1 the support of Y is y > 0 .
49
y dx y
The inverse transformation is x e and e . Therefore
dy
dx 1 y y
f ( y) f ( H ( y )) e e
( e ) 1
Y dy y
e y , y 0
fY ( y )
0, otherwise
Example 3: Suppose that Y ~ U (0,1) . Find the distribution of U g ( y) ln y
0, y 0
FY ( y ) y,0 y 1
1, y 1
d d
derivatives, we get, for u > 0, f U (u) = FU (u) (1 e u ) e u
du du
e u , u 0
Summarizing f U (u) =
0, elsewhere
3 y 2 ,0 y 1
f ( y)
Y
0, otherwise
Suppose we want to find the pdf of U = 2Y +3 . The range of U is 3 < U < 5. Now
50
u3 u3
3
u) =P Y
u 3 2 2 u 3
FU (u) = P (U u) = P ( 2Y + 3 f ( y ) dy 3 y 2 dy
2 0 0 2
0, u 0
u 3 3
So F (u)
U 2 ,3 u 5
1, u 5
3 (u 3) 2 ,3
dFU (u ) u 5
And f (u)
U du
8
0, otherwise
51
CHAPTER FIVE
5. Two Dimensional Random Variables
Definition: Let be an experiment and S the sample space associated with .Let X=X(s) and
Y=Y(s) be two functions each assigning a real number to each outcomes s S. Then (X, Y) is called a
two dimensional random variable (sometimes called a random vector). This implies a pair of random
variables defined over a joint sample space.
If X 1 X 1 (s), X 2 X 2 (s),......., X n X n (s) are n functions each assigning real number to every
outcomes s S, we call ( X 1, X 2 ,........, X n ) an n-dimensional random variable (or an n-dimensional
random vector)
S
X X(s)
s
Y
Y(s)
Note: As one dimensional case, our concern will be not with the functional nature of X (s) and Y (s) ,
but rather with the values which X and Y assume. We shall again speak of the range space of ( X , Y ) say
R X ,Y , as the set of all possible values of ( X , Y ) . In two dimensional case, for instance, the range space of
( X , Y ) will be a subset of the Euclidean plane. Each outcome X (s) , Y (s) may be represented as a point
( x, y) in the plane. We will again suppress the functional nature of X and Y by writing, for example,
P( X a, Y b) instead of P[ X (s) a, Y (s) b] .
Definition: If the possible values of (X,Y) are finite or countable infinite, (X,Y) is called a two
dimensional discrete random variable. That is, the possible values of (X, Y) may be represented as
(xi , y j ),i 1,2,..........n,......;j 1,2,........,m,........
If (X, Y) can assume all values in a specified region R in the xy-plane (non-countable set of Euclidian
plane), (X, Y) is called a two dimensional continuous random variable. For example, if ( X , Y ) assumes
52
Definition: (a) Let ( X , Y ) be a two dimensional discrete random variable with each possible outcome
(xi , y j ) we associate a number P( xi , y j ) representing P( X xi , Y y j ) and satisfying the following
conditions:
1. P( xi , y j ) 0 for all ( x, y)
2. P( x , y
i 1 j 1
i j ) 1
The function P defined for all (xi , y j ) in the range space of ( X , Y ) is called the probability function of
( X , Y ) .The set of triples (xi , y j , P( xi , y j )), i, j 1,2,.............., is sometimes called the probability
distribution of ( X , Y ) .
(b) Let ( X , Y ) be a continuous random variable assuming all values in some region R of the Euclidean
plane. The joint probability density function f is a function satisfying the following conditions:
3. f ( x, y) 0 for all ( x, y) R
4. f ( x, y)dxdy 1
R
Note : If B is in the range space of (X, Y), we have P(B) P[ (X(s), Y(s)) B)] P(s | (X(s), Y(s)) B]
P( B) P( x , y ), if (X, Y) is discrete , where the sum is taken over all indices
i i (i, j ) for which
B
Example 1: Suppose that a machine is used for particular testing the morning and the different task in
the afternoon. Let X and Y represent the number of machine breaks down in the morning and in the
afternoon respectively. The table below gives the joint probability distribution of X and Y.
X /Y 0 1 2
a. What is the probability that the machine breaks down equal number of times in the morning and
in the afternoon?
b. What is the probability that the number of machine break down in the morning is greater than in
the afternoon?
53
Solution:
a. P( X 0, Y 0) P( X 1, Y 1) P( X 2, Y 2) 0.25 0.08 0.13 0.46
Example 2: Determine the value of k for which the joint pdf is given as
y 1,2 ,3
Solution: The possible combination is
(1,1), (1,2), (1,3), ( 2,1), ( 2, 2), ( 2,3), (3,1), (3, 2), (3,3)
f (1,1) k (1)(1) k
f (1, 2) k (1)( 2) 2 k
f (1,3) k (1)(3) 3k
f ( 2,1) k ( 2)(1) 2 k
f ( 2, 2) k ( 2)( 2) 4 k
f ( 2,3) k ( 2)(3) 6 k
f (3,1) k (3)(1) 3k
f (3, 2) k (3)( 2) 6 k
f (3,3) k (3)(3) 9 k
1
k 2 k 3k 2 k 4 k 6k 3k 6k 9k 1 k
36
Example 3: Two production lines manufacture a certain type of item. Suppose that the capacity
(on any given day) is 5 items for line I and 3 items for line II. Assume that the number of items
actually produced by either production line is a random variable. Let ( X , Y ) represents the two
dimensional random variable yielding the number of items produced by line I and line II
respectively. The table given below gives the joint probability distribution of ( X , Y ) . Each entry
represents P ( xi , y j ) P ( X xi , Y y j )
54
X /Y 0 1 2 3 4 5
Solution: a) To determine c we use the fact that f ( x, y ) dxdy 1, therfore
900010,000
f(x,y)dxdy f(x,y)dxdy c[ 5000 ] 2 c [ 5000 ] 2
4000 5000
9000 y 9000
1 17
b) P( B) 1 P( X Y ) 1
(5000) 2
5000 5000
dxdy 1
5000
( y 5000)dy
25
Example 5: Suppose that the two dimensional continuous random variable ( X , Y ) has joint
pdf given by
2 xy
x ,0 x 1,0 y 2
f ( x, y ) 3
0, otherwise
55
Let B {X Y 1 }, then fi nd P(B)
_ _ 11 x xy 1 x(1 x)2
Solution: B {X Y 1 } P(B) 1 P( B) 1 x 2 dydx 1 x 2( 1 x) dx
0 0 3 0 6
7 65
1
72 72
Example 6: The joint pdf of a two dimensional random variable. (X, Y) is given by,
F ( x, y ) P ( X x, Y y )
56
If F is the (cdf) of a two dimensional random variable with joint pdf f , then
2
F(x,y)
f(x,y) wherever F is differentiable.
xy
x y ,0 x 1,0 y 1
f ( x, y )
0, otherwise
Find the distribution function (cumulative distribution function) of these two random variables.
Solution:
Case 1: x 0, y 0 F ( x, y ) 0
yx 1
Case 2: 0 x 1,0 y 1 ( x y ) dxdy xy ( x y )
00 2
y1 1
Case 3: x 1 and 0 y 1 (x y)dxdy y ( y 1)
00 2
1x 1
Case 4: x 1 and 0 y 1 (x y)dxdy x ( x 1)
00 2
11
Case 4: x 1 and y 1 (x y)dxdy 1
00
57
0 ,x 0 ,y 0
1
xy(x y),0 x 1,0 y 1
2
1
This implies that F(x,y) y(y 1 ),x 1 and 0 y 1
2
1
2 x(x 1 ),0 x 1 and y 1
1,x 1 and y 1
5.2 Marginal and conditional probability
where P* j Pij P (Y y j )
j
d
Similarly, P (c Y d ) P ( X , c Y d ) h ( y )dy
c
58
Example 1: The following table represents the joint probability distribution of the discrete
random variable ( X , Y )
59
Example 2 : Two characteristics of a rocket engine' s performance are thrust X and mixture ratio Y.
Suppose that (X, Y) is a two dimensiona l continuous random variable with pdf :
2(x y 2 xy),0 x 1, 0 y 1
f(x,y)
0, otherwise ( The units have been adjusted in order to use values between 0 and 1).
Find the marginal distributions of X and Y.
Solution: Let g(x) and h(y) be the marginal distributions of X and Y respectively, then
1
y2
0
1
x
2
0
1
Because of the requirement f ( x, y ) dxdy 1 , the above implies that the constant equals
1
area ( R )
Example 3: Suppose that the two dimensional random variable ( X , Y ) is uniformly distributed
over the shaded region R indicated in the figure below
y
(1, 1)
yx R y x2
x
60
1 1 2 1
f ( x, y ) , ( x , y ) R , we find that area ( R ) ( x x ) dx . Therefore the
area ( R) 0 6
6, ( x, y ) R
pdf is given by: f ( x , y )
0, ( x, y ) R
Therefore the marginal pdf of X will be
x 6( x x 2 ),0 x 1
g ( x ) f ( x, y ) dy 6dy 6( x x ) g ( x )
2
x2 0, otherwise
The marginal pdf of Y will be
y
6( y y ),0 y 1
h ( y ) f ( x, y ) dx 6dx 6( y y ) h ( y )
y
0, otherwise
Definition: Let X and Y be two random variables, discrete or continuous. The conditional distribution
of the random variable Y given that X x is
The above conditional distribution pdf ' s satisfay all the requiremen ts for a one dimensiona l probability distribution
If we wish to find the probability that the discrete random variable X falls between a and b when it is
known that the discrete variable Y y , we evaluate
Where the summation extends over all values of X between a and b. When X and Yare continuous, we
evaluate
61
Example 1: The following table represents the joint probability distribution of the discrete
random variable ( X , Y )
62
b. The conditional distribution of Y given X= 3
Example 2: The joint density for the random variables (X, Y), where X is the unit temperature change
and Y is the proportion of spectrum shift that a certain atomic particle produces, is
63
Example 3: Given the joint density function
64
Example 4: Suppose that the joint density of X and Y is given by
65
Example 1: Suppose that the pmf for the discrete random vector (Y 1, Y 2 ) is given by
Example 2: Let Y1 and Y2 denote the proportions of time (out of one workday) during which
employees I and II, respectively, perform their assigned tasks. Suppose that the random vector
(Y 1, Y 2 ) has joint pdf
66
Example 3: The joint pdf of the random variable (X, Y) is given by
67
68
5.4 Functions of two dimensional random variables
Let the joint pdf of (X, Y) be f(x,y) . Let U=g1(X, Y); V=g2(X, Y). The mapping from(X, Y) to
(U, V) is assumed to be one-to-one and onto. Hence, there are functions, h1 and h2 such that
Example 1: Let X and Y be independent random variables uniformly distributed on [0, 1]. Find
the distribution of X+Y.
69
Because V is the variable we introduced, to get the pdf of U, we just need to find the marginal pdf
from the joint pdf . From figure below, the regions of integration are
Graph of
70
e x ,x 0
Example 2: Let X and Y be independent random variables with common pdf f(x)
0 ,otherwise
x
Find the joint pdf of U ,V x y
x y
Example 3: If X and Y each follow exponential distribution with parameter 1 and are independent, find
the pdf of U = X – Y.
Solution
Since X and Y are independent random variables following exponential distribution with
parameter 1,
71
72
When the joint density function of then random variables X1 , X 2 ,..., X n is given and we want
73
Note: Suppose that ( X 1, X 2,......., Xn ) may assume all values in the region of 𝑛 dimensional space. That
There exists a joint probability density function 𝑓 satisfying the following conditions:
74
a) f ( x1 , x2 ,....., xn ) 0 for all ( x1 , x2 ,....., xn ) .
b) .......
f ( x1 , x2 ,....., xn ) dx1 ........dxn 1
Example 2: If the joint probability mass function of the three discrete random variables X, Y and Z is
(x y)z
given by f ( x , y , z ) ,for x 1,2 ,3,y 1,2 ,3,z 1,2 . Find P( X 2, Y Z 3)
63
Solution: Possible combinations for P( X 2, Y Z 3) are f ( 2,1,1), f ( 2,1, 2), f ( 2, 2,1) ,which implies that
( 2 1)1 ( 2 1) 2 ( 2 2)1 13
P ( X 2, Y Z 3)
63 63
Example 3: If the joint probability density function of the three continuous random variables
X 1 , X 2 and X 3 is given by f ( X , X , X )
( X 1 X 2 )e X 3 ,0 X 1 1, 0 X 1 1, X 3 0
1 2 3
0, otherwise
Find P ( X 1 , X 2 , X 3 ) A , where A is the region
1 1
( X 1 , X 2 , X 3 ) | 0 X 1 , X 2 1, X 3 1
2 2
11 2
1
1 1 X3
Solution: P ( X 1 , X 2 , X 3 ) | 0 X 1 , X 2 1, X 3 1 ( X 1 X 2 ), e dx1 dx 2 dx3 0.156
2 2 2 0 1 0
75
CHAPTER SIX
6. Mathematical expectations
6.1 Mean of a Random Variable
Definition: Let X be a random variable with probability distribution f (x) . The mean, expected value,
expectation of X is denoted by E ( X ) or µX which is:
If X is discrete
If X is continuous
Example 1:
Experiment: Tossing a coin twice.
Example 2: A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good
component and 3 defective components. A sample of 3 is taken by the inspector. Find the expected
value of the number of good components in this sample.
Solution: Let X be representing the number of good components in the sample. The probability
distribution of X is
76
Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good components
and 3 defective components, it would contain, on average, 1.7 good components.
Example 3: Let X be the random variable that denotes the life in hours of a certain electronic device.
The probability density function is
Solution:
Theorem: Let X be a random variable with probability function f (x ) . The expected value of the
random variable g (x ) is
Discrete: if X is discrete
Continuous: if X is continuous
Example 4: Suppose that the number of cars X that pass through a car wash between 4:00 P.M. and
5:00 P.M. on any sunny Friday has the following probability:
77
Let g ( X )= 2 X 1 represent the amount of money in dollars, paid to the attendant by the manager.
Find the attendant’s expected earnings for this particular time period.
Solution:
Definition: Let X and Y be random variables with joint probability distribution f (x, y ) . The mean or
expected value of g ( X, Y ) is:
78
Example 1: Let X and Y be the random variables with joint probability distribution indicated in the
following Table.
Solution:
Example 2: Find E Y
for the density function
X
Solution:
Discrete: if X is discrete
Continuous: if X is continuous
The positive square root of the variance is called the standard deviation of X.
var( X )
79
Example 1: Let the random variable X represent the number of automobiles that are used for
official business purposes on any given workday. The probability distribution for company A is
Show that the variance of the probability distribution for company B is greater than that of
company A.
2 2 2
Theorem: The variance of a random variable X is E ( X ) ,where E ( X )
80
Example 2: Let the random variable X represent the number of defective parts for a machine when 3
parts are sampled from a production line and tested. The following is the probability
distribution of X.
Solution:
Example 3: The weekly demand for Pepsi, in thousands of liters, from a local chain of efficiency stores,
is a continuous random variable X having the probability density
Solution:
Theorem: Let X be a random variable with probability function f (x ) . The variance of the random
variable g ( X ) is
Discrete: if X is discrete
Continuous: if X is continuous
81
Example 1: Calculate the variance of g ( X )= 2 X 3 , where X is a random variable with probability
distribution
Solution:
Definition: Let X and Y be random variables with probability distribution f (x,y ) . The
covariance of the random variables X and Y is
82
Continuous: if both X and Y are continuous
The covariance between two random variables is a measurement of the nature of the association between
the two. If large values of X often result in large values of Y or small values of X result in small values
of Y , positive X − µx will often result in positive Y − µy and negative X − µx will often result in
negative Y − µy. Thus the product (X − µx) (Y − µy) will tend to be positive. On the other hand, if large
X values often result in small Y values, the product (X − µx) (Y − µy) will tend to be negative. Thus the
sign of the covariance indicates whether the relationship between two dependent random variables is
positive or negative. When X and Y are statistically independent, it can be shown that the covariance is
zero. The converse, however, is not general true. Two variables may have zero covariance and still not
be statistically independent. Note that the covariance only describes the linear relationship between two
random variables. Therefore, if a covariance between X and Y is zero, X and Y may have nonlinear
relationship, which means that they are not necessarily independent.
2
Let Y = g ( X ) = X . Then Cov(X, Y ) = 0 , but X and Y have the quadratic relationship.
Theorem: The covariance of two random variables X and Y with means µX and µY, respectively, is
given by
83
Proof:
For the discrete case we can write
Example 2: The fraction X of male runners and the fraction Y of female runners who compete in
marathon races are described by the joint probability density function
Definition: Let X and Y be random variables with covariance σXY and standard deviations σX and σY,
respectively. The correlation coefficient of X and Y is
84
XY XY
XY 2 2 XY
X Y
It should be clear to that is free of the units of X and Y. The correlation coefficient satisfies the
XY
inequality−1≤ ≤1. It assumes a value of zero when σXY = 0. Where there is an exact linear
XY
dependency, say Y= a+bX, 1 if b>0 and 1 if b<0.
XY XY
Example 1: Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red pens,
and 3 green pens. If X is the number of blue pens selected and Y is the number of red pens selected, and
the following is the joint probability distribution:
Solution:
85
Example 2: The fraction X of male runners and the fraction Y of female runners who compete in
marathon races are described by the joint probability density function
86
6.3 Chebyshev's Inequality
MARKOV'S INEQUALITY: Suppose that X is a nonnegative random variable with pdf (pmf) f(x) and
let c be a positive constant. Markov's Inequality puts a bound on the upper tail probability P (X > c); that
is,
87
REMARK: The beauty of Chebyshev's result is that it applies to any random variable Y .In words,
P Y k is the probability that the random variable Y will differ from the mean by more than
Example 1: Suppose that Y represents the amount of precipitation (in inches) observed annually in
Barrow, AK. The exact probability distribution for Y is unknown, but, from historical information, it is
posited that = 4.5 and = 1. What is a lower bound on the probability that there will be between 2.5
and 6.5 inches of precipitation during the next year?
88
Solution:
One can show that 0.25 . So an upper bound for the given probability is 0.25 which is quite far
2
89
6.4 Moments and moment generating functions
6.4.1 Moments
'
The rth moment about the origin of a random variable X, denoted by r , is the expected value of Xr;
symbolically,
Definition:
Definition:
90
For any random variable for which μ exists. The second moment about the mean is of special
importance in statistics because it is indicative of the spread or dispersion of the distribution of a
random variable; thus, it is given a special symbol and a special name.
Definition:
Theorem:
Example:
91
Example:
Definition:
92
Example:
93
Theorem:
Example:
Theorem:
94
Definition: If we let u(X) = X, we obtain the conditional mean of the random variable X given Y=y,
which we denote by
2
Where E ( X | y ) with U ( X ) X 2
95
96