UNIT: 2
Sample space is the universal set that consists of all possible outcomes of
an experiment. Sample space is usually represented using the letter ‘S’
and individual outcomes are called the elementary events.
The sample space can be finite or infinite.
Definition: A sample space, is a set of possible outcomes of a random
experiment.
Example: Sample space = S = {all people in class}
Few random experiments and their sample spaces are
discussed below:
Experiment 1 : Outcome of a football match
Sample Space = S = {Win, Draw, Lose}
Experiment 2 : Predicting customer churn at an individual customer
level
Sample Space = S = {Churn, No Churn}
Experiment 3: Predicting percentage of customer churn
Sample Space = S = {X | X ∈ R, 0 ≤ X ≤ 100}, that is X is a real
number that can take any value between 0 and 100 percentage.
Experiment 4: Life of a turbine blade used in an aircraft engine
Sample Space = S = {X | X ∈ R, 0 ≤ X < ∞}, that is X is a real
number that can take any value between 0 and ∞.
E X A M P L E : When we flip a coin then sample space is
S = { H ,T } , where
Where H denotes that the coin lands ”Heads
up” and
T denotes that the coin lands ”Tails up”.
For a ”fair coin ” we expect H and T to have the same ”chance ” of
occurring, i.e., if we flip the coin many times then about 50 % of the
outcomes will be H .
We say that the probability of H to occur is 0.5 (or 50 %) . The
probability of T to occur is then also 0.5.
Problem :1
When we roll a fair die then the sample space is
S = { 1 ,2 , 3 , 4 , 5 , 6 }
The probability the die lands with k up is 1/6 , k={1,2,3,4,5,6} and when we roll it
1200 times we except a 5 up about 200 times.
The probability the die lands with an even number up is
1/6+1/6+1/6 = 1/2
Problem : 2
EXAMPLE : When we toss a coin 3 times and record the results in the sequence that
they occur, then the sample space is
S = { HHH , HHT , HTH , HTT , THH , THT , TTH , TTT } .
Elements of S are ”vectors ”, ”sequences ”, or ”ordered outcomes ”.
We may expect each of the 8 outcomes to be equally likely. Thus the probability of the
sequence HTT is 1/8 .
The probability of a sequence to contain precisely two Heads is
1/ 8 + 1 /8 + 1/ 8 = 3 /8
Problem 3 : When we toss a coin 3 times and record the results without
paying attention to the order in which they occur, e.g., if we only record the
number of Heads, then the sample space is
S = ‘ {H, H, H } , {H, H, T } , {H, T, T } , {T, T, T } ‘
The outcomes in S are now sets ; i.e., order is not important.
Recall that the ordered outcomes are
{ HHH , HHT , HTH , HTT , THH , THT , TTH , TTT } .
Note that
{H, H, H } Corresponds to one of the ordered outcomes,
{H, H, T } Corresponds to three of the ordered outcomes,
{H, T, T } Corresponds to three of the ordered outcomes,
{T, T, T } Corresponds to one of the ordered outcomes ,
Thus {H, H, H } and {T, T, T } each occur with probability 1 /8 ,
while {H, H, T } and {H, T, T } each occur with probability 3 /8 .
Definition: A probability event can be defined as a set of outcomes of an
experiment. In other words, an event in probability is the subset of the
respective sample space.
Pick a person in this class at random.
Sample space: = {all people in class}
Event A: A = {all males in class}.
Event B: B = {all females in class}.
Thus, an event is a subset of the sample space, i.e., E is a subset of S.
In the example above, event A occurs if the person we pick is male.
The entire possible set of outcomes of a random experiment is the sample
space.The likelihood of occurrence of an event is known as probability .
The probability of occurrence of any event lies between 0 and 1.
Example:
The sample space for the tossing of three coins simultaneously is given by:
S = {(T , T , T) , (T , T , H) , (T , H , T) , (T , H , H ) , (H , T , T ) , (H , T , H) ,
(H , H, T) ,(H , H , H)}
Suppose, if we want to find only the outcomes which have at least two heads;
then the set of all such possibilities can be given as:
E = { (H , T , H) , (H , H ,T) , (H , H ,H) , (T , H , H)}
Thus, an event is a subset of the sample space, i.e., E is a subset of S.
What is the Probability of Occurrence of an Event?
The number of favorable outcomes to the total number of outcomes is defined
as the probability of occurrence of any event. So, the probability that an event
will occur is given as:
P(E) = Number of Favorable Outcomes/ Total Number of Outcomes
1. Simple Events
Any event consisting of a single point of the sample space is known as a simple
event in probability. For example, if S = {56 , 78 , 96 , 54 , 89} and E = {78}
then E is a simple event.
2. Compound Events
if any event consists of more than one single point of the sample space then such an
event is called a compound event. Considering the same example again, if S =
{56 ,78 ,96 ,54 ,89}, E1 = {56 ,54 }, E2 = {78 ,56 ,89 } then, E1 and E2 represent
two compound events.
3. Independent Events and Dependent Events
If the occurrence of any event is completely unaffected by the occurrence of any
other event, such events are known as an independent event in probability and
the events which are affected by other events are known as dependent events.
Examples of Independent Events :
Tossing a Coin
Sample Space(S) in a Coin Toss = {H, T}
Both getting H and T are Independent Events.
4. Mutually Exclusive Events
If the occurrence of one event excludes the occurrence of another event,
such events are mutually exclusive events i.e. two events don’t have any
common point.
For example, if S = {1 , 2 , 3 , 4 , 5 , 6} and E1, E2 are two events such
that E1 consists of numbers less than 3 and E2 consists of numbers greater
than 4.
So, E1 = {1,2} and E2 = {5,6} .
Then, E1 and E2 are mutually exclusive.
5. Exhaustive Events
A set of events is called exhaustive if all the events together consume the
entire sample space.
Ex: Let us consider the experiment of throwing a die.
Sample space S = {1, 2, 3, 4, 5, 6}
A be the event of getting a number greater than 3
B be the event of getting a number greater than 2 but less than 5
C be the event of getting a number less than 3
We can write these events as:
A = {4, 5, 6}
B = {3, 4}
and C = {1, 2}
We observe that
A ⋃ B ⋃ C = {4, 5, 6} ⋃ {3, 4} ⋃ {1, 2} = {1, 2, 3, 4, 5, 6} = S
Therefore, A, B, and C are called exhaustive events.
5. Complementary Events
For any event E1 there exists another event E1‘ which represents the remaining
elements of the sample space S.
E1 = S − E1’
If a dice is rolled then the sample space S is given as S = {1 , 2 , 3 , 4 , 5 , 6 }.
If event E1 represents all the outcomes which is greater than 4, then
E1 = {5, 6} and E1’ = {1, 2, 3, 4}.
Thus E1’ is the complement of the event E1.
Events Associated with “OR”
If two events E1 and E2 are associated with OR then it means that either E1 or
E2 or both. The union symbol (∪) is used to represent OR in probability.
Thus, the event E1U E2 denotes E1 OR E2.
Events Associated with “AND”
If two events E1 and E2 are associated with AND then it means the intersection of
elements which is common to both the events. The intersection symbol (∩) is used
to represent AND in probability.
Thus, the event E1 ∩ E2 denotes E1 and E2.
Measures of probability :
A probability measure gives probabilities to a sets of experimental outcomes
(events). It is a function on a collection of events that assigns a probability of 0 and 1
to every event, meeting certain conditions.
Probability Measure Examples
For a roll of one six-faced die, the
sample space = {1, 2, 3, 4, 5, 6}.
If A = {1, 3, 5} is the event that the roll is odd, then P(A) = ½.
1. AXIOMS OF PROBABILITY:
According to axiomatic theory of probability, the probability of an event
E satisfies the following axioms:
1. The probability of event E always lies between 0 and 1. That is, 0 ≤
P(E) ≤ 1.
2. The probability of the universal set S is 1. That is, P(S) = 1.
3. P(X ∪Y) = P(X) + P(Y), where X and Y are two mutually exclusive
events.
The following elementary rules of probability are directly deduced from the
original three axioms of probability, using the set theory relationships:
1. For any event A, the probability of the complementary event, written AC, is
given by
P(AC) = 1 – P(A)
If P(A) is a probability of observing a fraudulent transaction at an e-commerce
portal, then P(AC) is the probability of observing a genuine transaction.
2. The probability of an empty or impossible event ,f, is zero:
P(f)=0
If occurrence of an event A implies that an event B also occurs, so that the event
class A is a subset of event class B, then the probability of A is less than or equal
to the probability of B:
P(A) < P(B)
If A is students with more than 3.5 CGPA (cumulative grade point average) out of 4 and
B is students with a CGPA of more than 3.0, then P(A) < P(B)
4. The probability that either events A or B occur or both occur is given by
P (A U B) = P(A) + P(B)- P (A ∩ B )
5 .If A and B are mutually exclusive events, so that P (A ∩ B ) = 0, then
P (A U B) = P(A) + P(B)
6. If A1 , A2 , …, An are n events that form a partition of sample space S,
then their probabilities must add up to 1:
Joint Probability :
Let A and B be two events in a sample space. Then the joint probability of the two events,
written as P(A ∩ B), is given by
13 42
P( Divorced ∩ Default )= -------- = 0.013 P( Single ∩ Default )= -------- = 0.042
1000 1000
50 300
P( Divorced )= ----------- = 0.05 P( Single )= ----------- = 0.3
1000 1000
1. Let there be a bag containing 5 white and 4 red balls .Two balls are
drawn from the bag one after the other without replacement. Consider
the following events.
A= Drawing a white ball in the first draw
B= Drawing a red ball in the Second draw.
Sol: P(B/A)= Probability of drawing a red ball in second draw given
that a white ball has already been drawn in the first draw.
P(B/A)= Probability of drawing a red ball from a bag containing 4
white and 4 red balls.
P(B/A)= 4/8 =1/2
For this Random Experiment P(A/B) is not meaningful because A
cannot occur after the occurrence of event B.
2. A Die is thrown twice and the sum of the numbers appearing is observed
to be 6. what is the conditional probability that the number 4 has appeared
at least once?
B= Number 4 has appears at least once
A=The Sum of the numbers appearing is 6, Required probability P(B/A)
Sol: A=((1,5),(2,4),(3,3),(4,2),(5,1)) P(A ∩ B)= 2 P(A)=5
Required probability = P(B/A)
= P(A ∩ B)/P(A) = 2/5
A= sum of the numbers appearing on two dice is 6
=(1,5),(5,1),(2,4),(4,2),(3,3) B= number 4 has appeared at least once
P(A)=5 =(1,4),(4,1),(2,4),(4,2),(3,4),(4,3),(4,4),(4,5),(5,4)
,(4,6),(6,4)
A∩B=(2,4),(4,2)
P(A∩B)=2
Question 3:
Ten numbered cards are there from 1 to 15, and two cards a
chosen at random such that the sum of the numbers on both the
cards is even. Find the probability that the chosen cards are
odd-numbered.
Let, A ≡ event of selecting two odd-numbered cards
B ≡ event of selecting cards whose sum is even.
Sol: Then,
P(B) = number of ways of choosing two numbers whose sum is even
= 8C 2 + 7C 2 .
P(A ∩ B) = number of ways of choosing odd-numbered cards such that
their sum is even.
= 8 C 2.
Now, P(A|B) = P(A ∩ B)/P(B)
= 8C2 / (8C2 + 7C2) = 4/7.
Bayes’ theorem is one of the most important concepts in analytics
since several problems are solved using Bayesian statistics. Consider
two events A and B. We can write the following two conditional
probabilities:
Using the two equations, we can show that
Bayes’ theorem helps the data scientists to update the probability of an
event (B) when any additional information is provided.
The following terminologies are used to describe various
components:
1. P(B) is called the prior probability (estimate of the probability
without any additional information).
2. P(B|A) is called the posterior probability (that is, given that the
event A has occurred, what is the probability of occurrence of event
B). That is, post the additional information (or additional evidence)
that A has occurred, what is estimated probability of occurrence of B.
3. P(A|B) is called the likelihood of observing evidence A if B is true.
4. P(A) is the prior probability of A.
A great example for human’s inability to take decisions is the famous
Monty Hall problem in which the contestants of a game show are
shown three doors Behind one of the doors is an expensive item
(such as a car or gold); while there are inexpensive items behind the
remaining two doors (such as a goat).
The contestant is asked to choose one of the doors. Assume that the
contestant chooses door 1; the game host would then open one of
the remaining two doors. Assume that the game host opens door 2,
which has a goat behind it. Now the contestant is given a chance to
change his initial choice (from door 1 to door 3).
In this problem, the contestant — the decision maker — has two
choices: he/she can either change his/her initial choice or stick with
his/her initial choice.
Let C1 , C2 , and C3 be the events that the car is behind door 1, 2, and 3,
respectively. Let D1 , D2 , and D3 be the events that Monty opens door 1, 2,
and 3, respectively.
Prior probabilities of C1 , C2 , and C3 are P(C1 ) = P(C2 ) = P(C3 ) = 1/3
Assume that the player has chosen door 1 and Monty opens door 2 to reveal a
goat.
posterior probability P(C1 |D2 ), Using, Bayes’ theorem
Generalization of Bayes’ Theorem:
Three machines A,B,C produce identical items of their irrespective
outputs 5%,4%,and 3% items are defective. On a certain day A has
produced 25% of the total output. B has produced 30% and C the
balance. An item is selected at random and is found defective. What is
the probability that it was produced by the machine with greatest
output?
Sol: let E1 ,E2,E3 denotes the events that an item is selected at random is
manufactured by the machines A,B,and C respectively and Let D be an event of its
being defective then we have P(E1)= 25/100, P(E2)=30/100,P(E3)= 45/100
The probability of drawing a defective item manufactured by machine A is
P( D/E1)=5/100=0.05
Similarly P(D/E2)=4%=0.04 P(D/E3)= 3%=0.03
A random variable is any function that assigns a numerical value to each possible
outcome.
The numerical value should be chosen to quantify an important characteristic of the
outcome. Random variables are denoted by capital letters X,Y, and so on, to
distinguish them from their possible values given in lowercase x, y.
Ex:
Suppose that a coin is tossed twice so that the sample space is S = {HH, HT, TH,
TT}. Let X represent the number of heads that can come up. for example, in the
case of HH (i.e., 2 heads), X = 2 while for TH (1 head), X =1. It follows that X is a
random variable.
Random variable HH HT TH TT
X 2 1 1 0
Random variables can be classified as discrete or continuous depending on the values that
the random variable can take.
Discrete Random Variables :
A Random variables which takes finite or at most countable ( may be finite or infinite)
number of values is known as discrete random variable. Or Discrete Random Variable
takes a countable number of possible outcomes.
Ex: i) Marks obtained by a student in a test
ii) Number of Defective nuts in a lot
iii) The number of cars that pass through a given intersection in an
hour.
iii) Number of errors on a page of a book
iv) Number of accidents taking place on busy road.
Thus, X = {1, 2, 3, 4, 5, 6}
Another popular example of a discrete random variable is the number of heads when
tossing of two coins. In this case, the random variable X can take only one of the three
choices i.e., 0, 1, and 2.
Continuous Random variable :
A random variable which takes all the possible values in an interval is called
Continuous variable.
Examples i) Waiting time for a bus
ii) Weight, Height of the students
Generally discrete random variables represent Counted data while Continuous random
variable represent measured data.
Probability Mass Function and Cumulative Distribution Function of a
Discrete Random Variable :
For a discrete random variable, the probability that a random variable X taking a
specific value xi , P(X = xi ), is called the probability mass function P(xi ).
Probability Mass Function :
P(X)=P(x=0)+p(x=1)+p(x=2)
= 1/4+1/2+1/4
=1
Cumulative distribution function, P(xi ), is the probability that the random
variable X takes values less than or equal xi . That is, P(xi ) = P(X ≤ xi ).
From the above problem
P(X < 2), probability that the number of heads are less than are equal
to two.
F(2) = P(x=0)+P(x=1)
= 1/4 +1/2
= 0.75
Example 2:
The Cumulative Distribution Function (CDF) is another important concept in
probability theory and statistics, especially when dealing with random variables, whether
discrete or continuous. The CDF provides the probability that a random variable X takes
on a value less than or equal to a specific point x.
The cumulative distribution function is denoted by F(x) and its formula is given by:
F(x)=P(X≤x)
Probability Mass Function and Cumulative Distribution Function of a
Continuous Random Variable :
See the below figure .
A probability distribution is a mathematical function that describes
the probability of different possible values and possibilities of
a random variable. Probability distributions are often depicted using
graphs or probability tables.
Example: Probability distribution
We can describe the probability distribution of one coin flip using a
probability table:
Outcome Probability
Heads Tails
.5 .5
Again the probability Distributions are Classified into two types.
1) Discrete probability Distribution
2) Continuous probability Distribution
A distribution is said to be discrete, if the value taken by the corresponding
random variable are discrete, whereas a distribution is said to be Continuous,
if the random variable takes any value in a specified interval.
In this Chapter we discuss the following Distributions:
1) Discrete probability Distribution
a) Binomial Distribution 2) Continuous probability Distribution
b) Poisson Distribution a) Normal Distribution
b) Exponential Distribution
c) Geometric Distribution c) Weibull Distribution
d) Bernoulli Distribution
Binomial Distribution :
Binomial distribution is one of the most important discrete probability
distribution due to its applications in several contexts. A random variable X is said
to follow a Binomial distribution when
1. The random variable can have only two outcomes success and failure (also
known as Bernoulli trials).
2. The objective is to find the probability of getting k successes out of n trials.
3. The probability of success is p and thus the probability of failure is (1 − p).
4. The probability p is constant and does not change between trials.
5. Success and failure are generic terminologies used in binomial distribution;
based on the context, the interpretation will change (winning a lottery can be
considered as success and not winning as failure).
Probability Mass Function (PMF) of Binomial Distribution : The PMF of the
Binomial distribution (probability that the number of success will be exactly x out of
n trials) is given by
where
In Microsoft Excel, the function ‘BINOM.DIST(x, n, p, false)’ can be used for
calculating the probability mass function of a binomial distribution.
Cumulative Distribution Function (CDF) of Binomial Distribution : CDF of a
binomial distribution function, F(a), representing the probability that the random
variable X takes value less than or equal to a, is given by
In Microsoft Excel, the function ‘BINOM.DIST(x, n, p, true)’ can be used for
calculating the cumulative distribution function of a binomial distribution.
Mean and Variance of Binomial Distribution:
Mean of a binomial distribution is given by
The variance of a binomial distribution is given by
Approximation of Binomial Distribution using Normal Distribution If the
number of trials (n) in a binomial distribution is large, then it can be
approximated by normal distribution with mean np and variance npq, where
q = 1 - p.
Binomial Probability:
Let X be a binomial random variable. Then, its probability mass function is:
for x = 0, 1, 2, . . . , n. The values of n and p are called the parameters of the
distribution.
Ex:
Consider an exam that contains 10 multiple-choice questions
with 4 possible choices for each question, only one of which
is correct.
Suppose a student is to select the answer for every question randomly.
Let X be the number of questions the student answers correctly. Then,
X has a binomial distribution with parameters n = 10 and p = 0.25.
(Convince yourself that all assumptions for a binomial distribution are
reasonable in this setting.)
What is the probability for the student to get no answer correct?
Answer:
What is the probability for the student to get two answers correct?
Answer:
What is the probability for the student to fail the test (i.e., to have less
than 6 correct answers)?
Answer:
Binomial Mean and Variance:
Mean= np
Variance=np(1-p)
Binomial Mean E(X) = 10 * 0.25 = 2.5.
Variance V (X) = 10 * (0.25) * (1 − 0.25) = 1.875.
Poisson Distribution
Poisson Distribution is a Probability distribution that is used to show how many times
an event occurs over a specific period.
It is the discrete probability distribution of the number of events occurring in a given
time period, given the average number of times the event occurs over that time
period. It is the distribution related to probabilities of events that are extremely rare
but have a large number of independent opportunities for occurrence.
Poisson Distribution Definition
Poisson distribution is used to model the number of events that occur in a fixed
interval of time or space, given the average rate of occurrence, assuming that the
events happen independently and at a constant rate
Poisson distribution formula
Mean and Variance of Poisson distribution:
The Poisson distribution has only one parameter, called λ.
The mean of a Poisson distribution is λ or (µ)
The variance of a Poisson distribution is also λ or (σ²)
In most distributions, the mean is represented by µ (mu) and the variance
is represented by σ² (sigma squared). Because these two parameters are
the same in a Poisson distribution, we use the λ symbol to represent
both.
1.An average of 0.61 soldiers died by horse kicks per year in each Prussian army corps.You
want to calculate the probability that exactly two soldiers died in the VII Army Corps in
1898, assuming that the number of horse kick deaths per year follows a Poisson
distribution.
Sol:
2. The number of typographical errors in a “big” textbook is Poisson
distributed with a mean of 1.5 per 100 pages.
Suppose 100 pages of the book are randomly selected. What is the
probability that there are no typos?
Sol:
Suppose 400 pages of the book are randomly selected. What are the
probabilities for having no typos and for having five or fewer typos?
Sol:
NORMAL DISTRIBUTION (GAUSSIAN DISTRIBUTION) :
The normal distribution is the most widely known and used of all
distributions. Because the normal distribution approximates many natural
phenomena so well, it has developed into a standard of reference for many
probability problems.
Let X be a continuous random variable, then it is said to follow normal
distribution if it is given by
Here u, 𝜎 are the mean & Standard Deviation of X.
Properties Of Normal Distribution :
It is a two parameter distribution, where the parameter U is the mean
(location parameter) and the parameter 𝜎 is the standard deviation (scale
parameter).
1. Normal curve is always centered at mean
2. Mean, median and mode coincide (i.e., equal)
3. It is unimodal.
4. It is a symmetrical curve and bell shaped curve
5. X-axis is an asymptote to the normal curve .
6. The total area under the normal curve from −∞ 𝑡𝑜∞ is “1”
7. The points of inflection of the normal curve are 𝜇 ± 𝜎, 𝜇 ± 3𝜎
8. The area of the normal curve between
𝜇 − 𝜎 to 𝜇 + 𝜎 = 68.27%
𝜇 − 2𝜎 𝑡𝑜 𝜇 + 2𝜎 = 95.44%
𝜇 − 3𝜎 𝑡𝑜 𝜇 + 3𝜎 = 99.73%
Standard Normal Variable Let with mean ‘0’ and variance is ‘1’ then the
normal variable is said to be standard normal variable.
Standard Normal Distribution :
The normal distribution with man ‘0’ and variance ‘1’ is said to be standard normal
distribution of its probability density function is defined by
By using the following transformation, any normal random variable X can be
converted into a standard normal variable:
The random variable X can be written in the form of a standard normal
random variable using the relationship.
Thus, any normal random variable X can be expressed using the standard
normal random variable Z.
Solved Examples
1. Calculate the probability of normal distribution with the population mean
2, standard deviation 3 or random variable 5.
Solution:
x=5
Mean = μ = 2
Standard Deviation = σ = 3
We will solve the questions with the help of the above normal
probability distribution formula: