CSC 124 Lec 10 Probability and Statistics
1 Continuous Probability Distribution Functions
• Uniform distribution functions
• Exponential distribution function
• Normal distribution functions
1.1 Normal distribution function
The most important distribution. It describes well the distribution of random
variables that arise in practice, such as the heights or weights of people, the total
annual sales of a firm, exam scores etc. Also, it is important for the central limit
theorem, the approximation of other distributions such as the binomial, etc.
Definition:
A random variable X is defined to have a Normal distribution if the proba-
bility distribution function of X is given by
( 1 2
√ 1 e− 2σ2 (x−µ) , −∞ < x < ∞
f (x) = f (x; µ, σ) = 2πσ 2
(1)
0, elsewhere
where the parameters µ and σ satisfy, −∞ < µ < ∞ and σ > 0. µ and σ 2
represent both the mean and variance of the normal distribution respectively.
That’s, if X has a p.d.f given by 1 with E(X) = µ and V ar(X) = σ 2 ; then it’s
normally written as
X ∼ N (µ, σ 2 ).
Normally we use the notation Φµ,σ2 (x) or simply Φ for the cumulative distribu-
tion function (c.d.f) of X. That is,
Φµ,σ2 (x) = Φ(x) = P (X ≤ x) (2)
= P (−∞ < X ≤ x)
Z x
1 1 2
= √ e− 2σ2 (x−µ) dx
2πσ 2
−∞
Theorem:
1
If X is a normal random variable, then E(X) = µ, V ar(X) = σ 2 and its
σ 2 t2/
moment generating function is Mx (t) = eµt+ 2
Proof:
Assume
σ 2 t2/
M (t) = eµt+x 2
Then, the mean and the variance from mgf becomes
Mean,
d σ 2 t2/ σ 2 t2/
Mx0 (t) = eµt+ 2 = (µ + σ 2 t)eµt+ 2
dt
Then,
E(X) = M (1) (t = 0)
σ 2 · 0/
= (µ + σ 2 · 0)eµ·0+ 2
=µ
For the variance, we have
2 2
µt+σ t/
Mx00 (t) = d 0
dt Mx (t) = d
dt
2
(µ + σ t)e 2
2 2
σ t/ σ 2 t2/
= σ 2 eµt+ 2 + (µ + σ 2 t)2 eµt+ 2
where
E(X 2 ) = M (2) (t = 0)
= σ 2 + µ2
leading to
V ar(X) = E(X 2 ) − [E(X)]2
= σ 2 + µ2 − µ2
= σ2
Note:
The graph of
1 1 2
f (x) = √ e− 2σ2 (x−µ) , −∞ < x < ∞
σ 2π
has no closed form solution, and
1. Is a bell-shaped curve; with the probability P (X ≥ a) = P (X > a)
2. Is symmetric about a vertical axis through x = µ. Therefore the area to
the left of µ is equal to the area to the right of µ; 50% each. That is,
P (X > µ + α) = P (X < µ − α)
3. Thus, a useful rule is: The interval µ ± 1σ covers the middle 68% of the
distribution; The interval µ±2σ covers the middle 95% of the distribution;
The interval µ ± 3σ covers the middle 100% of the distribution.
4. Has its maximum of √1 at x = µ
σ 2π
2
5. Has the x − axis as a horizontal asymptote
6. Has points of inflection at x = µ ± σ
7. How do we compute probabilities? Because the following integral has no
closed form (meaning; cannot be solved analytically) solution
Z ∞
1 1 2
P (X > α) = √ e− 2σ2 (x−µ) dx = · · ·
α σ 2π
the computation of normal distribution probabilities can be done through
the standard normal distribution
X −µ
Z
σ
Very importantly, we can give 2 important properties of normal random vari-
ables as:
If we let X ∼ N (µ, σ 2 ) with c.d.f P (X ≤ x) = F (X), then
1. Linear transformations of Normal random variables are also Normal ran-
dom variables i.e.
If Y = aX + b, then Y ∼ N (aµ + b, a2 σ 2 )
Proof:
E[Y ] = E[aX + b] = aE[X] + b = aµ + b Linearity of Expectation
V ar(Y ) = V ar(aX + b) = a2 V ar(X) = a2 σ 2 V ar(aX + b) = a2 V ar(X)
Thus Y is Normal
2. The p.d.f of a Normal random variable is symmetric about the mean µ.
F (µ − x) = 1 − F (µ + x)
⇒ P (X ≤ µ − x) = 1 − P (X ≤ µ + x)
Definition:
If Z is a normal random variable with µ = 0 and σ 2 = 1, then Z is called a
standard normal random variable. Its density is given by
1 z2
f (z) = √ e− 2 , −∞ < z < ∞
2π
Its cumulative density f unction, c.d.f is given by
Z z
1 z2
Φ(z) = √ e− 2 dz
−∞ 2π
Values of Φ(z) are given on various tables.
3
1.1.1 Calculating Probabilities
Using symmetry, and letting Z ∼ N (0, 1) with c.d.f P (Z ≤ z) = F (z), and
suppose we only knew the values for F (z) and F (y) for some z, y ≥ 0, then we
can compute the probabilities
(i) P (Z ≤ z) = F (z)
(ii) P (Z < z) = F (z)
(iii) P (Z ≥ z) = 1 − F (z)
(iv) P (Z ≤ −z) = 1 − F (z)
(v) P (Z ≥ −z) = F (z)
(vi) P (y < Z < z) = F (z) − F (y)
Let X ∼ N (µ, σ 2 ). To compute the c.d.f P (X ≤ x) = F (X), we use Φ, the
c.d.f for the Standard Normal Z ∼ N (0, 1) :
x−µ
F (x) = Φ
σ
Proof:
F (x) = P (X ≤ x) By def n of c.d.f
= P (X − µ ≤ x − µ) = P X−µ σ ≤ x−µ
σ
= P Z ≤ x−µ X−µ 1 µ
σ since σ = σ X − σ
A linear transf orm of X
Distributed as N ( σ1 µ − σµ , σ12
σ 2 ) = N (0, 1)
X−µ X−µ
By getting E σ & V ar σ
= Φ x−µ
σ
X −µ
T hus = Z ∼ N (0, 1) with c.d.f Φ
σ
Thus, for any problem,
(i) Compute z = x−µσ
(ii) Look up Φ(z) in Standard Normal Table, which is the CDF of Z defined
as P (Z ≤ z) = Φ(z).
1.1.2 Examples
1. Students spend some minutes, X, traveling between Lecture Halls. Av-
erage time spent, µ = 4 minutes with an average time of σ 2 = 2mins2 .
Suppose X is normally distributed, what is the probability that a stu-
dent spends greater than or equal to 6 minutes traveling? SOLUTION:
X ∼ N (µ = 4, σ 2 = 2)
R∞
P (X ≥ 6) = 6 f (x)dx
R∞ (x−µ)2
= 6 σ√12π e− 2σ2 dx
4
which cannot be solved analytically! To compute this probability for nor-
mal r.v., then its c.d.f has no closed form:
Rx
P (X ≤ x) = F (x) = −∞ f (x)dx
Rx (y−µ)2
= −∞ σ√12π e− 2σ2 dy
which can’t be solved analytically. However, the probability can be solved
numerically using the function Φ as
x−µ
F (x) = Φ
| {z } |{z}
0
σ
CDF of A f ucn that s
X∼N (µ,σ 2 ) solved numerically
x−µ
Step 1: Computing z = σ
P (X ≥ 6)
= 1− Fx (6)
= 1 − Φ 6−4
√
2
≈ 1 − Φ(1 · 41)
Step 2: Look up Φ(z) from the table.
1 − Φ(1 · 41)
≈ 1 − 0 · 9207
= 0 · 0793
2. Suppose the diameter of a certain car component follows the normal dis-
tribution with X ∼ N (10, 3). Find the proportion of these components
that have diameter larger than 13.4mm. Or, if we randomly select one of
these components, nd the probability that its diameter will be larger than
13.4mm. SOLUTION:
P (X > 13 · 4)
= P X−10 > 13·4−10
3 3
= P (Z > 1 · 13)
= 1 − P (Z < 1 · 13)
= 1 − 0 · 8708
= 0 · 1292
We read the number 0 · 8708 from the table. First we find the value of
z = 1 · 13 (first column and first row of the table - the first row gives
the second decimal of the value of z). Therefore the probability that the
diameter is larger than 13 · 4mm is 12 · 92%. What is z? The value of
z gives the number of standard deviations the particular value of X lies
above or below the mean µ. Thus X = µ ± zσ, in which case x = 13 · 4 lies
1·13 standard deviations away from the mean. Of course z will be negative
5
when the value of x is below the mean. Like, finding the proportion of
these components with diameter less than 5 · 1mm; we will have
P (X > 5 · 1)
= P X−10 > 5·1−10
3 3
= P (Z > −1 · 63)
= 1 − P (Z < 1 · 63)
= 1 − 0 · 9484
= 0 · 0516
Here the value of x = 5 · 1 lies 1 · 63 standard deviations below the mean µ.
Finding percentiles of the normal distribution : For example, look-
ing for 25th percentile (or 1st quartile) of the distribution of X OR looking
for c such that P (X < c) = 0.25. First we nd (approximately) the proba-
bility 0.25 from the table and we read the corresponding value of z. Here
it is equal to z = −0 · 675. It is negative because the rst percentile is below
the mean. Therefore,
x25 = 10 − 0 · 675(3)
= 7 · 975
3. If Z is a standard normal random variable, find (i) P (Z < 1) (ii) P (Z <
1 · 5) (iii) P (Z < −1) (iv) P (1 < Z < 1 · 5) SOLUTION: (i)
P (z < 1) = P (−∞ < z < 1)
= P (Z < 0) + P (0 < z < 1)
= 0 · 5 + 0 · 3431 = 0 · 8431
(ii)
P (z < 1 · 5)
= P (Z < 0) + P (0 < z < 1 · 5)
= 0 · 5 + 0 · 4332 = 0 · 9332
(iii)
P (1 < z < 1 · 5)
= P (0 < z < 1 · 5) − P (0 < z < 1)
= 0 · 4332 − 0 · 3418 = 0 · 0914
(iv)
P (z < −1)
= 0 · 5 − P (0 < z < 1)
= 0 · 5 − 0 · 3413
= 0 · 1587
4. Suppose that X has a normal distribution with mean 5 and variance 4;
find (i) P (2 < x < 10) (ii) P (−1 < x < 2) (iii) P (x > 4) SOLUTION: (i)
Since µ = 5, σ = 2 then,
z = x−µ
σ ⇒z = 2
x−5
6
has a standard normal distribution.
P (2 < x < 10)
= P 2−5 x−5 10−5
2 < 2 < 2
= P (−1 · 5 < z < 2 · 5)
= P (z < 2 · 5) − P (z < −1 · 5)
= 0 · 9938 − (0 · 5 − 0 · 4332)
= 0 · 927
(ii)
P (−1 < x < 2)
= P −1−5 < x−5 2−5
2 2 < 2
= P (−3 < z < −1 · 5)
= P (0 < z < 3) − P (0 < z < 1 · 5)
= 0 · 4987 − 0 · 4332
= 0 · 0655
(iii)
P (x > 4)
= P x−5 4−5
2 > 2
= P (z > −0 · 5)
= 1 − P (z < −0 · 5)
= 0 · 5 + P (0 < z < 0 · 5)
= 0 · 5 + 0 · 1915
= 0 · 6915
5. Suppose that the weight of navel oranges is normally distributed with
mean, µ = 8kg, and standard deviation, σ = 1 · 5kg. We can write
X ∼ N (µ = 8, σ 2 = 2 · 25). Answer the following questions: (a) What
proportion of oranges weigh more than 11 · 5kg? Or, if you randomly
select a navel orange, what is the probability that it weighs more than
11 · 5kg? SOLUTION:
P (X > 11 · 5) = P Z > 11·5−8
1·5
= P (Z > 2 · 33)
= 1 − P (Z < 2 · 33)
= 1 − 0 · 9901
= 0 · 0099
(b) What proportion of oranges weigh less than 8 · 7kg? SOLUTION:
P (X < 8 · 7) = P Z < 8·7−8
1·5
= P (Z < 0 · 47)
= 0 · 6808
(c) What proportion of oranges weigh between 6 · 8 and 8 · 9kgs? SOLU-
TION:
P (6 · 8X < 8 · 9) = P 6·8−8 8·9−8
1·5 < Z < 1·5
= P (−0 · 8 < Z < 0 · 6)
= 0 · 7257 − 0 · 2119
= 0 · 5138
7
(d) Find the 80th percentile of the distribution of X. This question could
also be asked as follows: Find the value of X below which you find the
lightest 80% of all the oranges. SOLUTION:
x−µ x−8
z= ⇒ 0 · 845 = ⇒ x = 9 · 27
σ 1·5
(e) Find the 5th percentile of the distribution of X. SOLUTION:
x−µ x−8
z= ⇒ −1 · 645 = ⇒ x = 5 · 53
σ 1·5
(f) Find the interquartile range of the distribution of X. SOLUTION: We
look for 25th and 75th percentiles. Thus,
F or 25th percentile
z = x−µ
σ =⇒ −0 · 675 =
x−8
1·5 ⇒ x = 6 · 9875
F or 75th percentile
z = x−µ
σ =⇒ 0 · 675 =
x−8
1·5 ⇒ x = 9 · 0125
Thus
IQR = 9 · 0125 − 6 · 9875 = 2.025
1.1.3 Exercise
1. The marks in a certain examination follow the normal distribution with
mean 45 and standard deviation 10. If 1000 students appeared at the
examination, calculate the number of students scoring: (i) Less than 40
marks (ii) More than 60 marks (iii) Between 40 and 50 marks
2. In an intelligence test administered to 1000 students, the average score
was 42 with a standard deviation of 24. Find (i) The number of students
exceeding a score of 50 (ii) The number of students lying between 30 and
54 (iii) The value of the score exceeded by the top 100 students.
2 Probability
Definition:
The probability of a given event is an expression of likelihood or chance of
occurrence of an event. It is a number which ranges between 0 to 1; 0 for an
event which cannot happen/ occur and 1 for an event certain to occur. How the
number is assigned would depend on the interpretation of the term “probability”.
There’s no general agreement about its interpretation.
Thus there are four different schools of thought on the concept of probability.
8
Figure 1: Standard Normal Table
Φ has been numerically computed
Standard Normal Table
An entry in the table is the area under the curve to the left of z, P(Z ≤ z) = Φ
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517
0.7 0.7580 0.7611 0.7642 0.7673 0.7703 0.7734 0.7764 0.7793 0.7823
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810
1.2 0.8849 0.8869 0.8888 0.8906 0.8925 0.8943 0.8962 0.8980 0.8997
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625
9
1. Classical/ Priori Probability: The oldest and simplest. Basic assumption
underlying the classical theory is that the outcomes of a random experi-
ment are “equally likely”. The event whose probability is sought consists
of one or more possible outcomes of the given activity, e.g. when a die is
rolled once any one of the possible outcomes i.e. 1, 2, 3, 4, 5, 6, can occur.
These activities are referred to as ‘experiment’, a term referring to pro-
cesses which result in different possible outcomes or observations. Term
‘equally likely’, conveys the notion that each outcome of an experiment
has the same chance of appearing as any other. Therefore in a throw of
a dice, occurrence of 1, 2, 3, 4, 5, 6 are equally likely events. Definition
of Probability by French Mathematician, Laplace: It’s the ratio of the
number of “favorable” cases to the total number of equally likely cases. If
probability of occurrence of A is denoted by P (A), then by this definition
we have:
N o. of f avourable cases
P (A) =
T otal N o. of equally likely cases
Illustration: From a bag containing 10 black and 20 white balls, a ball is
drawn at random. What is the probability that it is black? Solution:
P (A) = T otalNN
o. of f avourable cases
o. of equally likely cases
= 10 1
30 = 3
Shortcomings of the Classical Approach: (a) The definition cannot be ap-
plied whenever it is not possible to make a simple enumeration of cases
which can be considered equally likely, e.g. how does it apply to proba-
bility of rain? There could be 2 possibilities ‘rain’ or ‘no rain’, but at any
given time it will not usually be agreed that they are equally likely. Think
of jumping from the top of a tower; probability of his survival will not
be 50% since survival and death, the 2 mutually exclusive and exhaustive
outcomes are not equally likely. (b) Real life situations unlikely and disor-
derly as they often are make it difficult and at times impossible to apply
classical probability concept.
2. Relative Frequency Theory of Probability/ Relative Frequency of Occur-
rence: The probability of an event can be defined as the relative frequency
with which it occurs in an indefinitely large number of trials. If an event
occurs a times out of n, its relative frequency is na ; the value which is ap-
proached by na when n becomes infinity is called the limit of the relative
frequency. Symbolically,
a
P (A) = Lim
n→∞ n
Theoretically, we can never obtain probability of an event as given by the
above limit. However, in practice we can only try to have a close estimate
of P (A) based on a large number of observations, i.e. n. For convenience,
the estimate of P (A) can be written as if it were actually P (A) and the
relative frequency definition of probability may be expressed as:
a
P (A) = ,
n
10
implying that probability involves a long time concept. This is due to the
fact that probability is the value which is approached by na when becomes
infinity.
3. Subjective Approach to Probability: Also known as the personalistic school
of probability. Defined as the probability assigned to an event by an in-
dividual based on whatever evidence is available. Such probabilities are
based on the beliefs of the person making the probability statement. E.g.,
if a teacher wants to find out the probability of Mr. X topping in Statis-
tics Exam, he may assign a value between 0 and 1 according to his degree
of belief for possible occurrence. Factors he may take into account are
like: his past academic performance, the views of his other colleagues, the
attendance record, performance in periodic tests, etc. It permits probabil-
ity assignment to events for which there may be no objective data, or for
which there may be a combination of subjective and objective data. One
has to be very careful and consistent in the assignment of these probabil-
ities. Useful in the context of situations in business decision making once
used with care.
4. Axiomatic Approach to Probability: Kolmogorov axiomised the theory
of probability. He introduces probability as a set function. Following
his approach, no precise definition of probability is given, rather we give
certain axioms or postulates on which probability calculations are based.
Thus, the whole field of probability theory for which finite sample spaces is
based upon the following 3 axioms: (i) The probability of an event ranges
between 0 and 1. If the event cannot take place, its probability shall be
0, and if it’s certain or bound to occur, its probability shall be 1. (ii)
The probability of the entire sample space is 1., i.e. P (S) = 1. (iii) If A
and B are mutually exclusive (or disjoint) events then the probability of
occurrence of either A or B denoted by P (A ∪ B) shall be given by:
P (A ∪ B) = P (A) + P (B)
3 Bayes’ Theorem
This is based on the formula for conditional probability. Rev. Thomas Bayes
(1702 - 1761), contribution consists primarily of a unique method for calculating
conditional probability.
Bayesian approach addresses itself to the question of determining the prob-
ability of some event, A, given another event, B, has been (or will be) observed,
i.e. determining the value of P (A|B).
The event A is usually thought of as sample information so that Bayes’ rule
is concerned with determining the probability of an event given certain sample
information.
Examples:
11
1. A sample output of two defectives in 50 trials (event A) might be used to
estimate the probability that the machine is not working correctly (event B).
2. You might use the results of your 1st Examination in Statistics (event A)
as sample evidence in estimating the probability of getting a 1st Class Honours
(event B).
Let A1 and A2 be the set of events which are mutually exclusive (the 2
events can occur together) and exhaustive (the combination of the 2 events is
the entire experiment), and let B be a single event which intersects each of the
A events.
With a simple illustration, the part of B which is within A1 represents the
area “A1 and B” and the part of B which is within A2 represents the area “A2
and B”. This implies that
P (B) = P (A1 and B) + P (A2 and B) (3)
Then the probability of event A1 given B is
P (A1 and B)
P (A1 |B) = (4)
P (B)
P (A1 and B)
=⇒ P (B|A1 ) =
P (A1 )
T hus P (A1 and B) = P (A1 ) × P (B|A1 )
Similarly, the probability of event A2 given B is
P (A2 and B)
P (A2 |B) = (5)
P (B)
P (A2 and B)
=⇒ P (B|A2 ) =
P (A2 )
T hus P (A2 and B) = P (A2 ) × P (B|A2 )
Thus applying 3 above, then
P (A1 and B)
P (A1 |B) = (6)
P (B)
P (B|A1 ) × P (A1 )
=
P (A1 and B) + P (A2 and B)
P (A1 ) × P (B|A1 )
=
P (B|A1 ) × P (A1 ) + P (B|A2 ) × P (A2 )
P (A1 ) × P (B|A1 )
= 2
P
(P (Ai ) × P (B|Ai ))
i=1
and
P (A2 and B)
P (A2 |B) = (7)
P (B)
12
P (B|A2 ) × P (A2 )
=
P (A1 and B) + P (A2 and B)
P (A2 ) × P (B|A2 )
=
P (B|A1 ) × P (A1 ) + P (B|A2 ) × P (A2 )
P (A2 ) × P (B|A2 )
= 2
P
(P (Ai ) × P (B|Ai ))
i=1
Generally, if we let A1 , A2 , · · · , Ai , · · · , An be a set of mutually exclusive and
collectively exhaustive events. If B is another event such that P (B) is not zero,
then
P (B|A1 ) × P (A1 )
P (A1 |B) = P n (8)
(P (B|Ai ) × P (Ai ))
i=1
which is the Bayes’ Theorem!
3.1 Remarks:
Priori or Prior probability is prob. before revision by Bayes’ Rule. They are
determined before the sample information is taken into account.
Posterior or Revised probability is prob. which has undergone revision in
the light of sample information (via Bayes’ Rule). This represents a probability
calculated after this information is taken into account. They are obtained by
revising the prior probability in the light of the additional information gained.
Posterior probabilities are always conditional probability; the conditional event
being the sample information.
And thus a priori probability, which is unconditional prob. becomes a pos-
terior prob. which is conditional prob. by using Bayes’ rule.
Note:
• Classical theory is mainly Empirical since it employs only sample infor-
mation as the basis for estimation and testing.
• Bayesian approach employs any and all available information whether it’s
personal judgement or empirical evidence. Bayesian inference can be made
on prior information alone or on both prior and sample information.
3.1.1 Examples
1. Assume that a factory has 2 machines. Past records show that machine 1
or M 1 produces 30% of the items of output and machine 2 or M 2 produces
70% of the items. Further, 5% of the items produced by M 1 were defective
and only 1% produced by M 2 were dwfective. If a defective item is drawn
at random, what is the probability that the defective item was produced by
M 1 or M 2? SOLUTION: Let A1 = the event of drawing an item produced
by M 1, A2 = the event of drawing an item produced by M 2 and B = the
13
Figure 2: Computation of Posterior Probabilities
Events Prior Prob Conditional Prob. Joint prob. Posterior prob.
A1 P (A1 ) = 0.30 P (B|A1 ) = 0.05 0.015 0.015/0.022 = 0.682
A2 P (A2 ) = 0.70 P (B|A2 ) = 0.01 0.007/
0.007 0.022 = 0.318
Total 1.000 P(B)=0.022 1.000
event of drawing a defective item produced either by M 1 or M 2. From 1st
information; P (A1 ) = 30% = 0.3, P (A2 ) = 70% = 0.7 and from additional
information, P (B|A1 ) = 5% = 0.05, P (B|A2 ) = 1% = 0.01. Thus, as
illustrated from Figure 2, then, The probability that the defective item
was produced by M 1 is 0.682 = 68.2% and by M 2 is 0.318 = 31.8%.
Hence, the defective item is more likely drawn from the output produced
by M 1. CHECKING THE ANSWER: If 10,000 items were produced by
the 2 machines in a given period; then (i) No. of items produced by
M 1 = 10, 000 × 0.3 = 3, 000
(ii) No. of items produced by
M 2 = 10, 000 × 0.7 = 7, 000
(iii) No. of defective items produced by
M 1 = 3000 × 0.05 = 150
and No. of defective items produced by
M 2 = 7000 × 0.01 = 70.
Thus, the probability that a defective item was produced by
150
M1 = = 0.682
150 + 70
and
70
M2 = = 0.318
150 + 70
2. A manufacturing firm produces units of a product in 4 plants. Define event
Ai and event B as: Event Ai : a unit is produced in plant i, i = 1, 2, 3, 4
and Event B : a unit is defective. From the past records of the proportions
of defectives produced at each plant, the following conditional probabilities
are set;
P (B|A1 ) = 0.05 P (B|A2 ) = 0.10 P (B|A3 ) = 0.15 P (B|A4 ) = 0.02
The 1st plant produces 30% of the units of the product; the 2nd plant
produces 25%, the 3rd produces 40% and the 4th , the remaining 5%. A
14
Figure 3: Computation of Posterior Prob. using Bayes’ Theorem
P (B|Ai )×P (Ai )
Plant P (Ai ) P (B|Ai ) P (Ai )P (B|Ai ) P (Ai |B) = P4
(P (B|Ai )×P (Ai ))
i=1
0.015/
1. 0.30 0.05 0.015 0.101 = 0.1485
0.025/
2. 0.25 0.10 0.025 0.101 = 0.2475
0.060/
3. 0.40 0.150 0.060 0.101 = 0.5941
0.001/
4. 0.05 0.02 0.001 0.101 = 0.0099
Total 1.00 P (B) = 0.101 1.00
unit of the product made at one of these plants is tested and is found to
be defective. What is the probability that the unit was produced in plant
3? SOLUTION: Applying the Bayes’ Theorem, we observe the following:
P (A1 ) = 0.3; P (A2 ) = 0.25; P (A3 ) = 0.4; P (A4 ) = 0.05
Thus from 8 above we have:
P (B|Ai ) × P (Ai )
P (Ai |B) = 4
P
(P (B|Ai ) × P (Ai ))
i=1
P (A3 )P (B|A3 )
P (A3 |B) = (9)
P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + P (A3 )P (B|A3 ) + P (A4 )P (B|A4 )
0.060
=
0.015 + 0.025 + 0.060 + 0.001
= 0.5941
Hence, tabulating from Figure 3, we have,
P (A3 |B) = 0.5941 = 59.41%
3.1.2 Exercise
1. Suppose a STA 101 class has 70% males and 30% females. It’s known that
in a test, 5% of males and 10% of females got an “A” grade. If one student
from this class is randomly selected and observed to have an “A” grade,
what is the probability that this is a male student?
2. A company uses a “selling aptitude test” in the selection of salesmen. Past
experience has shown that only 70% of all persons applying for a sales
position achieved a classification “dissatisfactory” in actual selling, whereas
the remainder were classified as “satisfactory”, 85% had scored a passing
15
grade on the aptitude test. Only 25% of those classified unsatisfactory, had
passed the test on the basis of this information. What is the probability
that a candidate would be a satisfactory salesman given thta he passed
the aptitude test?
4 Define the following terms as used in statistics
• Experiments and events
• Mutually exclusive events
• Exhaustive events
• Independent and dependent events
• Equally likely events
• Simple and compound events
• Complementary events
• The Addition and Multiplication Theorems
16