Introduction to Probability
Unit 3
Learning Objectives
Understand uncertainty and how probability concepts are used for
measuring and modelling uncertainty.
Learn basic concepts in probability: axioms of probability, frequency
estimate of probability, conditional probability and Bayes’ theorem.
Learn how simple probability rules are used for solving business
problems using association rule mining and its applications in market
basket analysis and recommender systems.
Understand the concept of random variables, discrete and continuous
random variables, probability density function, and cumulative
distribution function.
Understand various discrete distributions such as binomial
distribution, Poisson distribution, and geometric distribution and their
applications for solving business problems.
Understand various continuous distributions such as uniform,
exponential, normal, chi-square, t, and F distributions and their
applications for solving business problems.
Introduction to Probability
One of the primary objectives in analytics is to measure the
uncertainty associated with an event or key performance
indicator.
Axioms of probability and the concept of random variable
are fundamental building blocks of analytics that are used
for measuring uncertainty associated with key performance
indicators of importance for a business.
Probability theory is the foundation on which descriptive
and predictive analytics models are built.
Introduction to Probability
Analytics applications involve tasks such as
prediction of probability of occurrence of an
event, testing a hypothesis, building models to
explain variation in a variable of importance to
the business such as profitability, market share,
demand, etc.
Many important tasks in analytics deal with
uncertain events and it is essential to understand
probability theory that can be used to predict and
measure uncertain events.
Introduction to Probability
Probability quantifies the uncertainty of the outcomes
of a random variable. Or, it quantifies likelihood or
possibilities of an event.
Specifically, it quantifies how likely a specific outcome
is for a random variable, such as the flip of a coin, the
roll of a dice, or drawing a playing card from a deck.
For a random variable x, P(x) is a function that assigns
a probability to all values of x.
Probability Density of x = P(x)
PROBABILITY THEORY – TERMINOLOGY
Random Experiment
Random experiment is an experiment in which the
outcome is not known with certainty. That is, the output of
a random experiment cannot be predicted with certainty.
Predictive analytics mainly deals with random
experiments such as:
predicting quarterly revenue of an organization
customer churn (whether a customer is likely to churn or
how many customers are likely to churn before next quarter)
demand for a product at a future time period
number of views for an YouTube video
outcome of a football match (win, draw or lose), etc.
PROBABILITY THEORY – TERMINOLOGY
Sample Space
Sample space is the universal set that consists of all possible
outcomes of an experiment. Sample space is usually represented
using the letter ‘S’ and individual outcomes are called the
elementary events.
The sample space can be finite or infinite.
Few random experiments and their sample spaces are discussed
below:
Experiment: Outcome of a football match
Sample Space = S = {Win, Draw, Lose}
Experiment: Predicting customer churn at an individual customer
level
Sample Space = S = {Churn, No Churn}
Experiment: Predicting percentage of customer churn
Sample Space = S = {X | X ∈ R, 0 ≤ X ≤ 100}, that is X is a real
number that can take any value between 0 and 100 percentage.
Experiment: Life of a turbine blade used in an aircraft engine
Sample Space = S = {X | X ∈ R, 0 ≤ X < ∞}, that is X is a real
number that can take any value between 0 and ∞.
PROBABILITY THEORY – TERMINOLOGY
Event
Event (E) is a subset of a sample space and probability is
usually calculated with respect to an event.
An event can be represented using the Venn diagram in
Figure below
The Venn diagram in Figure indicates that the event E is a
subset of the sample space S, that is, E ⊂ S (E is a subset of
S).
Consider the random experiment that predicts number of
customers who are likely to churn within a quarter from a
customer base of 100 customers.
PROBABILITY THEORY – TERMINOLOGY
The corresponding sample space = {X| X ∈ Z,
0 ≤ X ≤ 100}, that is X is a real number that
can take any integer value between 0 and
100. Now we can define several events such
as:
Event A = Number of customer churn less than
10
Event B = Number of customer churn between
10 and 30
Event C = Number of customer churn
exceeding 30
PROBABILITY THEORY – TERMINOLOGY
Probability Estimation using Relative
Frequency
The classical approach to probability estimation of an
event is based on the relative frequency of the
occurrence of that event. According to frequency
estimation, the probability of an event X, P(X), is given
by:
For example, say a company has 1000 employees and
every year about 200 employees leave the job. Then the
probability of attrition of an employee per annum is
200/1000 = 0.2.
Algebra of Events
Assume that X, Y and Z are three events of a sample space. Then the
following algebraic relationships are valid and are useful while
deriving probabilities of events:
Commutative rule: X ∪ Y = Y ∪ X and X ∩ Y = Y ∩ X
Associative rule: (X ∪ Y) ∪ Z = X ∪ (Y ∪ Z) and (X ∩ Y) ∩ Z = X ∩ (Y
∩ Z)
Distributive rule: X ∪ (Y ∩ Z) = (X ∪ Y) ∩ (X ∪ Z)
X ∩ (Y ∪ Z) = (X ∩ Y) ∪ (X ∩ Z)
The above rules of algebra will be useful while calculating the
probability of events. The following rules known as DeMorgan’s Laws
on complementary sets are useful while deriving probabilities:
(X ∪ Y)C = XC ∩ YC
(X ∩ Y)C = XC ∪ YC
where XC and YC are the complementary events of X and Y,
respectively.
FUNDAMENTAL CONCEPTS IN PROBABILITY
– AXIOMS OF PROBABILITY
According to axiomatic theory of probability, the probability
of an event E satisfies the following axioms:
1. The probability of event E always lies between 0 and
1. That is, 0 ≤ P(E) ≤ 1.
2. The probability of the universal set S is 1. That is,
P(S) = 1.
3. P(X ∪ Y) = P(X) + P(Y), where X and Y are two
mutually exclusive events.
The following elementary rules of probability are directly deduced
from the original three axioms of probability, using the set theory
relationships:
Example
The probability of an event not occurring, is called the
complement.
This can be calculated by one minus the probability of the
event, or 1 – P(A).
For example, the probability of not rolling a 5 would be
1 – P(5) or 1 – 0.166 or about 0.833 or about 83.333%.
Probability of Not Event A = 1 – P(A)
Probability can range in from 0 to 1, where 0 means the
event to be an impossible one and 1 indicates a certain
event.
The probability of all the events in a sample space
adds up to 1.
Basic Probability Concepts
Marginal Probability
Joint Probability
Conditional Probability
Probability Trees and Bayes’ Theorem
Problems and Solutions on Probability
Question 1: Find the probability of ‘getting
3 on rolling a die’.
Solution:
Sample Space = S = {1, 2, 3, 4, 5, 6}
Total number of outcomes = n(S) = 6
Let A be the event of getting 3.
Number of favorable outcomes = n(A) = 1
i.e. A = {3}
Probability, P(A) = n(A)/n(S) = 1/6
Hence, P(getting 3 on rolling a die) = 1/6
Question 2: Draw a random card from a pack of cards.
What is the probability that the card drawn is a face
card?
Solution:
A standard deck has 52 cards.
Total number of outcomes = n(S) = 52
Let E be the event of drawing a face card.
Number of favorable events = n(E) = 4 x 3 = 12
(considered Jack, Queen and King only)
Probability, P = Number of Favorable Outcomes/Total
Number of Outcomes
P(E) = n(E)/n(S)
= 12/52
= 3/13
P(the card drawn is a face card) = 3/13
Question 3: A vessel contains 4 blue balls, 5 red balls
and 11 white balls. If three balls are drawn from the
vessel at random, what is the probability that the first
ball is red, the second ball is blue, and the third ball is
white?
Solution:
The probability to get the first ball is red or the first event is
5/20.
Since we have drawn a ball for the first event to occur, then
the number of possibilities left for the second event to occur is
20 – 1 = 19.
Hence, the probability of getting the second ball as blue or the
second event is 4/19.
Again with the first and second event occurring, the number of
possibilities left for the third event to occur is 19 – 1 = 18.
And the probability of the third ball is white or the third event
is 11/18.
Therefore, the probability is 5/20 x 4/19 x 11/18 = 44/1368 =
0.032.
Or we can express it as: P = 3.2%.
Question 4: Two dice are rolled, find the probability that the sum
is:
1. equal to 1
2. less than 13
Solution:
To find the probability that the sum is equal to 1 we have to first determine
the sample space S of two dice as shown below.
S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }
So, n(S) = 36
1) Let E be the event “sum equal to 1”. Since, there are no outcomes which
where a sum is equal to 1, hence, P(E) = n(E) / n(S) = 0 / 36 = 0
2) Let B be the event of getting the sum of numbers on dice is less than 13.
From the sample space, we can see all possible outcomes for the event B,
which gives a sum less than B. Like: (1,1) or (1,6) or (2,6) or (6,6). So you
can see the limit of an event to occur is when both dies have number 6, i.e.
(6,6). Thus, n(B) = 36
Hence, P(B) = n(B) / n(S) = 36 / 36 = 1
Equally Likely Events
When the events have the same theoretical probability of
happening, then they are called equally likely events.
The results of a sample space are called equally likely if all
of them have the same probability of occurring.
Getting 3 and 5 on throwing a die
Getting an even number and an odd number on a die
Getting 1, 2 or 3 on rolling a die
Complementary Events
The possibility that there will be only two outcomes which
states that an event will occur or not.
Basically, the complement of an event occurring in the exact
opposite that the probability of it is not occurring. Some
more examples are:
• It will rain or not rain today
• The student will pass the exam or not pass.
• You win the lottery or you don’t.
Independent Events
Independent events are those events whose occurrence is not
dependent on any other event. For example, if we flip a coin in the
air and get the outcome as Head, then again if we flip the coin but
this time we get the outcome as Tail. In both cases, the occurrence
of both events is independent of each other.
If the probability of occurrence of an event A is not affected by the
occurrence of another event B, then A and B are said to be
independent events.
Consider an example of rolling a die.
If A is the event ‘the number appearing is odd’ and B be the event
‘the number appearing is a multiple of 3’, then
P(A)= 3/6 = 1/2 and P(B) = 2/6 = 1/3
Also A and B is the event ‘the number appearing is odd and a
multiple of 3’ so that
P(A ∩ B) = 1/6
P(A│B) = P(A ∩ B)/ P(B) = (1/6) / (1/3) = 1/2
P(A) = P(A│B) = 1/2 , which implies that the occurrence of event B
has not affected the probability of occurrence of the event A .
If A and B are independent events, then P(A│B) = P(A)
Using Multiplication rule of probability, P(A ∩ B) = P(B) .P(A│B)
Mutually Exclusive Events
Two events are said to be mutually exclusive if they
cannot occur at the same time or simultaneously.
They are also called disjoint events.
If two events are considered disjoint events, then the probability
of both events occurring at the same time will be zero.
If the events A and B are not mutually exclusive, the probability
of getting A or B that is P (A ∪ B) formula is given as follows:
P (A ∪ B) = P(A) + P(B) – P (A and B)
Here P (A and B) means P(A ∩ B) is zero
When tossing a coin, the event of getting head and tail are
mutually exclusive.
In a six-sided die, the events “2” and “5” are mutually exclusive.
Marginal Probability
The probability of an event occurring (p(A)), it may be
thought of as an unconditional probability.
It is not conditioned on another event.
Example: the probability that a card drawn is red
(p(red) = 0.5).
Another example: the probability that a card drawn is 4
(p(four)=1/13).
Marginal Probability
Joint Probability
It is the probability of two different event A and event
B occurring at the same time.
It is the probability of the intersection of two or more
events.
The probability of the intersection of A and B may be
written p(A ∩ B) or p(A and B).
Example: the probability that a card is a four and red
=
p(four and red) = 2/52=1/26.
(There are two red fours in a deck of 52, the 4 of
hearts and the 4 of diamonds).
Joint Probability
Question 1 At an e-commerce customer service centre a
total of 112 complaints were received. 78 customers
complained about late delivery of the items and 40
complained about poor product quality.
(a) Calculate the probability that a customer complaint will be
about both late delivery and product quality.
(b) What is the probability that a complaint is only about poor
quality of the product?
Solution
Let A = Late delivery and B = Poor quality of the product. Let
n(A) and n(B) be the number of cases in favour of A and B. So
n(A) = 78 and n(B) = 40. Since the total number of
complaints is 112 (here complaints is treated as the sample
space), hence
n(A ∩ B) = 118 – 112 = 6
Joint Probability
Conditional Probability
If A and B are events in a sample space, then the
conditional probability of the event B given that the event A
has already occurred, denoted by P(B|A), is defined as:
The conditional probability symbol P(B|A) is read as the
probability of B given A. It is necessary to satisfy the
condition that P(A) > 0, because it does not make sense to
consider the probability of B given that event A is
impossible.
the conditional probability of default given divorced is
P(Default|Divorced) = 0.013/0.05 = 0.26 and
similarly probability of default given single is
P(Default|Single) = 0.042/0.3 = 0.14
APPLICATION OF SIMPLE PROBABILITY
RULES – ASSOCIATION RULE LEARNING
We can use simple probability concepts such as joint probability and
conditional probability to solve analytics problems such as market
basket analysis and recommender systems using algorithms such as
Association Rule Learning (aka Association Rule Mining).
Association rule mining is one of the popular algorithms used to
solve problems such as market basket analysis and recommender
systems.
Market basket analysis (MBA) is used frequently by retailers to
predict products a customer is likely to buy together, which further
can be used for designing planogram and product promotions. The
primary objective of MBA is to find probability of buying two
products (A and B) together.
Recommender systems are models that produce list of
recommendations to a customer on products such as books, movies,
news items, etc. and is an important analytics technique.
Association Rule Learning
In general, association rule learning (also known as
association rule mining) is a method of finding association
between different entities in a database.
In a retail context, association rule learning is a method for
finding association relationships that exist in frequently
purchased items.
Association rule is a relationship of the form X → Y (that is,
X implies Y). Here, X and Y are two mutually exclusive sets
(set 3.2,
In Table of stock keeping
transaction ID isunits or SKUs).
the transaction reference
number and apple, orange, etc.
are the different SKUs sold by
the store. Binary code is used
to represent whether the SKU
was purchased (equal to 1) or
not (equal to 0) during a
transaction. The strength of
association between two
mutually exclusive subsets can
be measured using ‘support’,
Association Rule Learning
Support between two sets (of products purchased) is
calculated using the joint probability of those events:
where n(X ∩ Y) is the number of times both X and Y is
purchased together and N is the total number of transactions.
That is, support is proportion of times X and Y are purchased
together.
Confidence is the conditional probability of purchasing
product Y given the product X is purchased.
The third measure in association rule mining is lift, which is
given by
Lift overcomes one of the disadvantages of using confidence.
For example, P(X) could be very small, making it less
attractive for MBA and recommendation among millions of
Association Rule Learning
In Table 3.2, assume that X = Apple and Y = Banana. Then
Association rules can be generated based on threshold
values of support, confidence and lift. For example, assume
that the cut-off for support is 0.25 and confidence is 0.5 (lift
should be greater than 1). Then we can conclude that X
implies Y (that is, purchase of apple implies purchase of
banana, however this rule will be ineffective since lift is
less than 1).
Bayes’ Theorem
It describes the probability of an event, based on prior
knowledge of conditions that might be related to that
event.
It can also be considered for conditional probability
examples.
It is used where the probability of occurrence of a
particular event is calculated based on other conditions
which are also called conditional probability.
For example: There are 3 bags, each containing some
white marbles and some black marbles in each bag. If a
white marble is drawn at random. With probability to find
that this white marble is from the first bag. In cases like
such, we use the Bayes’ Theorem.
Bayes’ Theorem
Bayes’ theorem is one of the most important concepts in analytics since several
problems are solved using Bayesian statistics. Consider two events A and B. We can
write the following two conditional probabilities:
Using the two equations, we can show that
Equation (3.13) is the Bayes’ theorem. Bayes’ theorem helps the data scientists to
update the probability of an event (B) when any additional information is provided.
This makes Bayesian statistics a very attractive technique since it helps the
decision maker to fine-tune his/her belief with every additional data that is
received.
The following terminologies are used to describe various components in Eq.
(3.13).
P(B) is called the prior probability (estimate of the probability without any additional
information).
P(B|A) is called the posterior probability (that is, given that the event A has
occurred, what is the probability of occurrence of event B). That is, post the
additional information (or additional evidence) that A has occurred, what is
estimated probability of occurrence of B.
P(A|B) is called the likelihood of observing evidence A if B is true.
Generalization of Bayes’ Theorem
The probability of evidence P(A) may come from mutually
exclusive subsets (events) as described in Figure 3.2.
For better understating, consider a part manufactured by
different suppliers B1 , B2 , ..., Bn . Let A denote a
defective part. P(A) can be written as:
Bayes’ Theorem-example
There are three urns containing 3 white and 2 black balls; 2
white and 3 black balls; 1 black and 4 white balls respectively.
There is an equal probability of each urn being chosen. One
ball is equal probability chosen at random. what is the
probability that a white ball is drawn?
Let E1, E2, and E3 be the events of choosing the first,
second, and third urn respectively. Then,
P(E1) = P(E2) = P(E3) =1/3
Let E be the event that a white ball is drawn. Then,
P(E/E1) = 3/5, P(E/E2) = 2/5, P(E/E3) = 4/5
By theorem of total probability, we have
P(E) = P(E/E1) . P(E1) + P(E/E2) . P(E2) + P(E/E3) . P(E3)
= (3/5 * 1/3) + (2/5 * 1/3) + (4/5 * 1/3)
= 9/15 = 3/5
Bayes’ Theorem- example
At an electronics plant, it is known from past
experience that the probability is 0.83 that a new
worker who has attended the company’s training
program will meet the production quota and that
the corresponding probability is 0.35 for a new
worker who has not attended the company’s
training program. If 80 % of all new workers
attend the training program, what is the
probability that a new worker will meet the
production quota? Also, find the probability that a
new worker who meets the production quota will
have attended the company’s training program.
Bayes’ Theorem- example
60% of the companies that increased their share
price by more than 5% in the last three years
replaced their CEOs during the period.
At the same time, only 35% of the companies that
did not increase their share price by more than
5% in the same period replaced their CEOs.
Knowing that the probability that the stock prices
grow by more than 5% is 4%, find the probability
that the shares of a company that fires its CEO
will increase by more than 5%.
Bayes’ Theorem- example
Before finding the probabilities, you must first
define the notation of the probabilities.
• P(A) – the probability that the stock price
increases by 5%
• P(B) – the probability that the CEO is replaced
• P(A|B) – the probability of the stock price
increases by 5% given that the CEO has been
replaced
• P(B|A) – the probability of the CEO replacement
given the stock price has increased by 5%.
Using the Bayes’ theorem, we can find the
required probability:
Thus, the probability that the shares of a company
that replaces its CEO will grow by more than 5% is
6.67%.
Random Variable
A variable is defined as any symbol that can take any
particular set of values.
If the value of a variable depends upon the outcome of
a random experiment, it is a random variable and can take
up any real value.
Such an experiment, where we know the set of all possible
results but find its impossible to predict one at any
particular execution, is a random experiment.
Mathematically, a random variable is a real-valued function
whose domain is a sample space S of a random experiment.
Random variable is always denoted by capital letter like
X,Y,M.
Lowercase letters like x, y, z, m etc. represent its value.
Random Variable
X denotes the Probability Distribution of random variable
X.
P(X) denotes the Probability of X.
p(X=x) denotes the Probability that random variable X is
equivalent to any particular value, represented by x.
Experiment is tossing a coin 2 times.
Sample space(S) is {HH, TH, HT, TT}.
X(Random Variable) is the number of both heads when we
toss a coin 2 times.
P(X=HT)=0.25, P(X=TT)=0.25, P(X=HH)=0.25,
P(X=HT)=0.25
For outcome {HT},
Then, X(HH) = 0, X(TH) = 0, X(HT) = 1, X(TT) = 0.
Random Variable
Since there are two forms of data, discrete and
continuous, there are two types of random
variables.
It can be categorized into two types:
Discrete Random Variable
Continuous Random variable
Discrete random variable
If the random variable X can assume only a finite or countably
infinite set of values, then it is called a discrete random variable.
There are very many situations where the random variable X can
assume only finite or countably infinite set of values.
Examples of discrete random variables are:
Credit rating (usually classified into different categories such as
low, medium and high or using labels such as AAA, AA, A, BBB,
etc.).
Number of orders received at an e-commerce retailer which can
be countably infinite.
Customer churn [the random variables take binary values: (a)
Churn and (b) Do not churn].
Fraud [the random variables take binary values: (a) Fraudulent
transaction and (b) Genuine transaction].
Any experiment that involves counting (for example, number of
returns in a day from customers of e-commerce portals such as
Amazon, Flipkart; number of customers not accepting job offers
Continuous random variable
A random variable X which can take a value from an infinite set
of values is called a continuous random variable.
Examples of continuous random variables are listed below:
Market share of a company (which take any value from an
infinite set of values between 0 and 100%).
Percentage of attrition among employees of an organization.
Time to failure of engineering systems.
Time taken to complete an order placed at an e-commerce
portal.
Time taken to resolve a customer complaint at call and service
centers.
Height, Weight, Amount of rainfall, etc.
In many situations, a continuous variable may be converted to a
discrete random variable for modelling purpose.
Probability Distributions
A probability distribution is a function that calculates the
likelihood of all possible values for a random variable.
For any event of a random experiment, we can find its
corresponding probability.
For different values of the random variable, we can find
its respective probability.
The values of random variables along with the
corresponding probabilities are the probability
distribution of the random variable.
A probability distribution and probability mass functions
can both be used to define a discrete probability distribution.
A continuous probability distribution is described using a
probability distribution function and a probability density
function.
Discrete Random Variables
The probability distribution of a discrete random variable is
a list of probabilities associated with each of its possible
values.
It is also sometimes called the probability function or the
probability mass function.
More formally, the probability distribution of a discrete
random variable X is a function which gives the probability
p(xi) that the random variable equals xi, for each value xi:
p(xi) = P(X=xi)
It satisfies the following conditions:
• 0 <= p(xi) <= 1
• sum of all p(xi) is 1
Discrete Random Variables
Consider a random variable X= number of heads after tossing
a coin thrice.
x ∈ {0,1,2,3}.
All the possible outcomes after a coin is flipped thrice are,
{HHH,HHT,HTT,TTT,TTH,THH,THT,HTH}.
What will be the probability that 0 heads occur?
We denote it as P(X=0)=1/8=0.125
probability of getting exactly 1 head=P(X=1)=3/8=0.375
P(X=2)=3/8=0.375
P(x=3)=1/8=0.125
If we sum up the probabilities of all outcomes, it will be equal
to one. This gives us the Probability Distribution of that
random variable.
Probability Mass Function(PMF) of Discrete
Random Variable
In the case of Discrete Random Variables, the function that
denotes the probability of the random variable for each x in
the range of X is known as the Probability Mass
Function(PMF).
It can be shown using tables or graph or mathematical
equation.
Probability Distribution Function(PDF)
of Discrete Random Variable
Note that the values of x take on all possible cases. and the sum
of the probabilities add to 1. mathematically, this can be written
as f(x) = p(x = x). the set of ordered pairs (x, f(x)) is called
the probability function, probability mass
function or probability distribution function of the discrete
random variable x. f(x) is considered a probability mass function
if it satisfies the following conditions:
In case of rolling of a die, the probability of each value X can
take is the same. So the probability distribution in this case will
be:
P(X=1) = 1/6, P(X=2) = 1/6 and so on.
Note that the values of x take on all possible cases. And the sum
of the probabilities add to 1.
Probability Distribution Function(PDF)
of Discrete Random Variable
Mathematically, this can be written as f(x) = P(X = x).
The set of ordered pairs (x, f(x)) is called the probability
function, probability mass function or probability
distribution function of the discrete random variable X.
f(x) is considered a probability mass function if it satisfies
the following conditions:
Example
Cumulative distribution Function(CDF) of
Discrete Random Variable
However, many times we may wish to compute the
probability that the random variable X be lesser than or
equal to some real number x.
Writing F(x) = P(X ≤ x) for every real number x, we define
F(x) to be the cumulative distribution function of the
random variable X.
Example 1
Cumulative distribution function, F(xi ), is the probability that the
random variable X takes values less than or equal xi . That is, F(xi
) = P(X ≤ xi ).
F(2) = P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.60
Example 2
The probability of X is less than or 1 is 0.1.
Similarly, probability of X is less than or equal to 2 is (0.1+0.3)
=0.4 and so on.
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
The probability of a continuous random variable assuming
exactly any of its values is 0.
Hence, the probability distribution for a continuous random
variable cannot be given in tabular form.
The probability density function of a continuous random variable
is a function which can be integrated to obtain the probability
that the random variable takes a value in a given interval.
The probability for a continuous random variable is always
computed at intervals : P(a≤X≤b).
The probability distribution of a continuous random variable can be
stated as a formula; and f(x) is called the probability density function,
or simply a density function, of X.
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
Here probability is given by the surface area under the
curve with a interval. To find the probability of a certain
interval, say a to b, we find the area under that curve by
integrating the PDF in that interval.
fₓ(x) is the Probability Density Function
And the cumulative distribution function F(x) of a
continuous random variable is given by:
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
Probability density function and cumulative distribution
function of a continuous random variable satisfy the
following properties:
The expected value of a continuous random variable, E(X),
is given by
The variance of a continuous random variable, Var(X), is
given by
Probability Distribution
Summary
Depending on type of random variables, its
probability distribution can be categorized into :
1. Discrete probability distributions
2. Continuous probability distributions
Bernoulli Distribution
This distribution is generated when we perform an experiment
once and it has only two possible outcomes – success and
failure.
The trials of this type are called Bernoulli trials, which form
the basis for many distributions
Let p be the probability of success and 1 – p is the probability
of failure.
The PMF is given as
Examples:
flipping a coin once. p is the probability of getting a head and
1 – p is the probability of getting a tail.
Will you pass or fail a test?
Will your favourite sports team win or lose their next match?
Will you be accepted or rejected for that job you applied for?
Binomial Distribution
Binomial distribution is one of the most important discrete
probability distribution due to its applications in several contexts.
A random variable X is said to follow a Binomial distribution when
1. The random variable can have only two outcomes success and failure
(also known as Bernoulli trials).
2. The objective is to find the probability of getting k successes out of n
trials.
3. The probability of success is p and thus the probability of failure is (1 −
p).
4. The probability p is constant and does not change between trials .
Success and failure are generic terminologies used in binomial
distribution; based on the context, the interpretation will change (winning
a lottery can be considered as success and not winning as failure).
Binomial Distribution
In analytics, the following are few example problems that can
be associated with Binomial distribution:
Customer churn where the outcomes are: (a) Customer churn and (b) No
customer churn.
Fraudulent insurance claims where the outcomes are: (a) Fraudulent
claim and (b) Genuine claim.
Loan repayment default by a customer where the outcomes are: (a)
Default and (b) No default.
Cart abandonment in e-commerce (a situation where the customer adds
items to his/her cart but does not make the purchase), where the
outcomes are: (a) Cart abandonment and (b) No cart abandonment.
Employee attrition at a company where the outcomes are: (a) The
employee leaves (exits) the company and (b) The employee does not
leave the company.
Any business context in which there are only two outcomes can be
analysed using Binomial distribution
Binomial Distribution
Example:
Flipping a coin n number of times and calculating the
probabilities of getting a particular number of heads.
More real-world examples include the number of successful
sales calls for a company or whether a drug works for a
disease or not.
Number of winning lottery tickets when you buy 10 tickets
of the same kind
Number of left-handers in a randomly selected sample of
100 unrelated people
Binomial Distribution
Binomial Distribution
For example, suppose we shuffle a standard deck of cards,
and we turn over the top card. We put the card back in the
deck and reshuffle. We repeat this process five times. Let X
equal the number of Jacks we observe. Is this a binomial
distribution?
B – binary – yes, either it’s a Jack or it isn’t
I – independent – yes, because we replace the card each
time, the trials are independent.
N – number of trials fixed in advance – yes, we are told to
repeat the process five times.
S – successes (probability of success) are the same – yes, the
likelihood of getting a Jack is 4 out of 52 each time you turn
over a card.
Therefore, this is an example of a binomial distribution.
Suppose that Charlie makes a free throw has probability of
0.82 on any one try. Assuming that this probability doesn’t
change, find the chance that Charlie makes 4 out of the next
seven free throws.
let’s determine the number of free throws
Charlie should expect to make and the standard
deviation.
Example
Example
Poisson Distribution
It describes the events that occur in a fixed
interval of time or space.
Examples:
Consider the case of the number of calls received by
a customer care center per hour. We can estimate the
average number of calls per hour but we cannot
determine the exact number and the exact time at
which there is a call. Each occurrence of an event is
independent of the other occurrences.
The PMF is given as,
Poisson Distribution
where λ is the average number of times the event
has occurred in a certain period of time,
x is the poisson random variable (with desired
outcome)
and e is the base of logarithm , Euler’s number,
and e = 2.71828 (approx).
Properties of Poisson
Distribution
The occurrence of the event are independent in an
interval.
An infinite number of occurrences of the of the event
are possible in the interval.
The probability of a single event in the interval is
proportional to the length of the event.
In an infinitely small portion of the interval, the
probability of more than one occurrence of the event is
negligible.
The Poisson distribution is limited when the number of
trials n is indefinitely large.
If the mean is large, then the Poisson distribution is
approximately a normal distribution.
Poisson Distribution
In Poisson distribution, the mean is represented
as μ = E(X) = λ.
The mean and the variance of Poisson
Distribution are equal. It means that E(X) = V(X)
Where,
V(X) is the variance.
The standard deviation is always equal to the
square root of the mean μ.
Applications of Poisson Distribution
• To count the number of defects of a finished
product
• To count the number of deaths in a country
by any disease or natural calamity
• To count the number of infected plants in the
field
• To count the number of bacteria in the
organisms or the radioactive decay in atoms
• To calculate the waiting time between the
events.
Poisson Distribution-Example
In a cafe, the customer arrives at a mean rate of 2 per
min. Find the probability of arrival of 5 customers in 1
minute using the Poisson distribution formula.
Solution:
Given: λ = 2, and x = 5.
Using the Poisson distribution formula:
P(X = x) = (e-λ λx )/x!
P(X = 5) = (e-2 25 )/5!
P(X = 5) = 0.036
Answer: The probability of arrival of 5 customers
per minute is 3.6%.
Poisson Distribution-Example
Find the mass probability of function at x = 6, if
the value of the mean is 3.4.
Solution:
Given: λ = 3.4, and x = 6.
Using the Poisson distribution formula:
P(X = x) = (e-λ λx )/x!
P(X = 6) = (e-3.4 3.46 )/6!
P(X = 6) = 0.072
Answer: The probability of function is 7.2%.
Poisson Distribution-Example
If 3% of electronic units manufactured by a company are
defective. Find the probability that in a sample of 200 units,
less than 2 bulbs are defective.
Solution:
The probability of defective units p = 3/100 = 0.03
Give n = 200.
We observe that p is small and n is large here. Thus it is a Poisson
distribution.
Mean λ= np = 200 × 0.03 = 6
P(X= x) is given by the Poisson Distribution Formula as (e -λ λx )/x!
P(X < 2) = P(X = 0) + P(X= 1)
=(e-6 60 )/0! + (e-661 )/1!
= e-6 + e-6 × 6
= 0.00247 + 0.0148
P(X < 2) = 0.01727
Answer: The probability that less than 2 bulbs are defective
Continuous probability distributions
These distributions model the probabilities of random
variables that can have any possible outcome, also real.
Two continuous probability distribution function are
associated with such continuous random variables:
Probability Density Function (PDF)
Cumulative Density/Distribution Function(CDF)
For example, the possible values for the random
variable X that represents weights of citizens in a town
which can have any value like 34.5, 47.7, etc.,
Examples: Normal, Student’s T, Chi-square,
Exponential, etc.,
Probability Density function
Probability Density function (PDF) estimate
the probability that it lies within a particular
range of values for any given
outcome(continuous) for continuous
distributions.
Cumulative Distribution Function
The Cumulative Distribution Function of X,
evaluated at x is the probability that X will
take a value less than or equal to x.
Continuous probability distributions
When working with continuous random variables,
such as X, we only calculate the probability
that X lie within a certain interval;
like P(X≤k) or P(a≤X≤b).
We don't calculate the probability of X being
equal to a specific value k.
In fact the result, P(X=k)=0 , will always be true:
This can be explained by the fact that the total
number of possible values of a continuous random
variable X is infinite, so the likelihood of any one
single outcome tends towards 0.
Continuous probability distributions
The idea is to integrate the probability density
function f(x) to define a new function F(x),
known as the cumulative density function.
To calculate the probability that X be within a
certain range,
say a≤X≤b, we calculate F(b)−F(a), using
the cumulative density function.
Put "simply" we calculate probabilities as:
P(a≤X≤b)=
where f(x) is the variable's probability density
function.
Normal Distribution
It has two parameters namely mean and standard
deviation.
The mean has the highest probability and all other
values are distributed equally on either side of the
mean in a symmetric fashion.
The standard normal distribution is a special case
where the mean is 0 and the standard deviation of
1.
68% of the values are 1 standard deviation away,
95% percent of them are 2 standard deviations
away, 99.7% are 3 standard deviations away from
mean.
Normal Distribution
The standard normal distribution is one of the forms of the
normal distribution. It occurs when a normal random variable
has a mean equal to zero and a standard deviation equal to one.
In other words, a normal distribution with a mean 0 and
standard deviation of 1 is called the standard normal
distribution. Also, the standard normal distribution is centred
at zero, and the standard deviation gives the degree to which a
given measurement deviates from the mean.
A Z score represents how many standard deviations an
observation is away from the mean. The mean of the standard
normal distribution is 0. Z scores above the mean are positive
and Z scores below the mean are negative.
Once you have computed a Z-score, you can look up the
probability in a table for the standard normal distribution
Standard Normal Distribution
The random variable of a standard normal distribution is known as
the standard score or a z-score. It is possible to transform every
normal random variable X into a z score using the following
formula:
z = (X – μ) / σ
where X is a normal random variable, μ is the mean of X, and σ is
the standard deviation of X. You can also find the normal
distribution formula here. In probability theory, the normal or
Gaussian distribution is a very common continuous probability
distribution.
Standardizing a normal distribution When you standardize a normal
distribution, the mean becomes 0 and the standard deviation
becomes
This allows you to easily calculate the probability of certain
values occurring in your distribution, or to compare data sets
with different means and standard deviations.
Normal Distribution
The PDF is given by,
where μ is the mean of the random variable X and σ is the standard
deviation.
Example
Example
Example
Z table
Z table
Example
Example
Example2
Example2
Example2
Exponential Distribution
To predict the amount of waiting time until the
next event in a Poisson process (i.e., success,
failure, arrival, etc.).
For example, we want to predict the following:
• The amount of time until the customer finishes
browsing and actually purchases something in
your store (success).
• The amount of time until the hardware on AWS
EC2 fails (failure).
• The amount of time you need to wait until the bus
arrives (arrival).
Exponential Distribution
where λ is the rate parameter. λ = 1/(average time between
events) = 1/μ
and e=2.71828
The mean of the exponential distribution is 1/λ.
And the variance of the exponential distribution is 1/λ2.
Exponential Distribution
For example, suppose the mean number of minutes
between eruptions for a certain geyser is 40 minutes.
If a geyser just erupts, what is the probability that
we’ll have to wait less than 50 minutes for the next
eruption?
To solve this, we need to calculate rate parameter:
λ = 1/μ => λ = 1/40 => λ = .025
plug in λ = .025 and x = 50 to the formula for the
CDF:
P(X ≤ x) = 1 – e-λx => P(X ≤ 50) = 1 – e-.025(50)
P(X ≤ 50) = 0.7135
Exponential Distribution
Assume that you usually get 2 phone calls per hour.
calculate the probability, that a phone call will come
within the next hour.
Solution:
It is given that, 2 phone calls per hour.
So, it would expect that one phone call at every half-
an-hour.
So, we can take λ = 0.5
So, the computation is as follows:
= 0.393469
Therefore, the probability of arriving the phone calls
within the next hour is 0.393469