0% found this document useful (0 votes)

21 views103 pages

Unit 3 Introduction To Probability

This document provides an introduction to probability, covering fundamental concepts such as axioms of probability, random variables, and various discrete and continuous distributions. It emphasizes the importance of probability theory in analytics for measuring uncertainty and solving business problems, including applications in predictive analytics and market analysis. Key topics include probability estimation, algebra of events, and types of events such as independent and mutually exclusive events.

Uploaded by

theiconicps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views103 pages

Unit 3 Introduction To Probability

Uploaded by

theiconicps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 103

Introduction to Probability

Unit 3
Learning Objectives
 Understand uncertainty and how probability concepts are used for
measuring and modelling uncertainty.
 Learn basic concepts in probability: axioms of probability, frequency
estimate of probability, conditional probability and Bayes’ theorem.
 Learn how simple probability rules are used for solving business
problems using association rule mining and its applications in market
basket analysis and recommender systems.
 Understand the concept of random variables, discrete and continuous
random variables, probability density function, and cumulative
distribution function.
 Understand various discrete distributions such as binomial
distribution, Poisson distribution, and geometric distribution and their
applications for solving business problems.
 Understand various continuous distributions such as uniform,
exponential, normal, chi-square, t, and F distributions and their
applications for solving business problems.
Introduction to Probability
 One of the primary objectives in analytics is to measure the
uncertainty associated with an event or key performance
indicator.
 Axioms of probability and the concept of random variable
are fundamental building blocks of analytics that are used
for measuring uncertainty associated with key performance
indicators of importance for a business.
 Probability theory is the foundation on which descriptive
and predictive analytics models are built.
Introduction to Probability
Analytics applications involve tasks such as
prediction of probability of occurrence of an
event, testing a hypothesis, building models to
explain variation in a variable of importance to
the business such as profitability, market share,
demand, etc.
Many important tasks in analytics deal with
uncertain events and it is essential to understand
probability theory that can be used to predict and
measure uncertain events.
Introduction to Probability
 Probability quantifies the uncertainty of the outcomes
of a random variable. Or, it quantifies likelihood or
possibilities of an event.
 Specifically, it quantifies how likely a specific outcome
is for a random variable, such as the flip of a coin, the
roll of a dice, or drawing a playing card from a deck.
 For a random variable x, P(x) is a function that assigns
a probability to all values of x.
 Probability Density of x = P(x)
PROBABILITY THEORY – TERMINOLOGY
Random Experiment
Random experiment is an experiment in which the
outcome is not known with certainty. That is, the output of
a random experiment cannot be predicted with certainty.
Predictive analytics mainly deals with random
experiments such as:
predicting quarterly revenue of an organization
customer churn (whether a customer is likely to churn or
how many customers are likely to churn before next quarter)
demand for a product at a future time period
number of views for an YouTube video
outcome of a football match (win, draw or lose), etc.
PROBABILITY THEORY – TERMINOLOGY
Sample Space
 Sample space is the universal set that consists of all possible
outcomes of an experiment. Sample space is usually represented
using the letter ‘S’ and individual outcomes are called the
elementary events.
 The sample space can be finite or infinite.
 Few random experiments and their sample spaces are discussed
below:
 Experiment: Outcome of a football match
Sample Space = S = {Win, Draw, Lose}
 Experiment: Predicting customer churn at an individual customer
level
Sample Space = S = {Churn, No Churn}
 Experiment: Predicting percentage of customer churn
Sample Space = S = {X | X ∈ R, 0 ≤ X ≤ 100}, that is X is a real
number that can take any value between 0 and 100 percentage.
 Experiment: Life of a turbine blade used in an aircraft engine
Sample Space = S = {X | X ∈ R, 0 ≤ X < ∞}, that is X is a real
number that can take any value between 0 and ∞.
PROBABILITY THEORY – TERMINOLOGY
 Event
 Event (E) is a subset of a sample space and probability is
usually calculated with respect to an event.
 An event can be represented using the Venn diagram in
Figure below

 The Venn diagram in Figure indicates that the event E is a

subset of the sample space S, that is, E ⊂ S (E is a subset of
S).
 Consider the random experiment that predicts number of
customers who are likely to churn within a quarter from a
customer base of 100 customers.
PROBABILITY THEORY – TERMINOLOGY
The corresponding sample space = {X| X ∈ Z,
0 ≤ X ≤ 100}, that is X is a real number that
can take any integer value between 0 and
100. Now we can define several events such
as:
Event A = Number of customer churn less than
10
Event B = Number of customer churn between
10 and 30
Event C = Number of customer churn
exceeding 30
PROBABILITY THEORY – TERMINOLOGY
Probability Estimation using Relative
Frequency
 The classical approach to probability estimation of an
event is based on the relative frequency of the
occurrence of that event. According to frequency
estimation, the probability of an event X, P(X), is given
by:

 For example, say a company has 1000 employees and

every year about 200 employees leave the job. Then the
probability of attrition of an employee per annum is
200/1000 = 0.2.
Algebra of Events
Assume that X, Y and Z are three events of a sample space. Then the
following algebraic relationships are valid and are useful while
deriving probabilities of events:
 Commutative rule: X ∪ Y = Y ∪ X and X ∩ Y = Y ∩ X
 Associative rule: (X ∪ Y) ∪ Z = X ∪ (Y ∪ Z) and (X ∩ Y) ∩ Z = X ∩ (Y
∩ Z)
 Distributive rule: X ∪ (Y ∩ Z) = (X ∪ Y) ∩ (X ∪ Z)
 X ∩ (Y ∪ Z) = (X ∩ Y) ∪ (X ∩ Z)
The above rules of algebra will be useful while calculating the
probability of events. The following rules known as DeMorgan’s Laws
on complementary sets are useful while deriving probabilities:
 (X ∪ Y)C = XC ∩ YC
 (X ∩ Y)C = XC ∪ YC
 where XC and YC are the complementary events of X and Y,
respectively.
FUNDAMENTAL CONCEPTS IN PROBABILITY
– AXIOMS OF PROBABILITY
According to axiomatic theory of probability, the probability
of an event E satisfies the following axioms:

1. The probability of event E always lies between 0 and

1. That is, 0 ≤ P(E) ≤ 1.
2. The probability of the universal set S is 1. That is,
P(S) = 1.
3. P(X ∪ Y) = P(X) + P(Y), where X and Y are two
mutually exclusive events.
The following elementary rules of probability are directly deduced
from the original three axioms of probability, using the set theory
relationships:
Example
 The probability of an event not occurring, is called the
complement.
 This can be calculated by one minus the probability of the
event, or 1 – P(A).
 For example, the probability of not rolling a 5 would be
1 – P(5) or 1 – 0.166 or about 0.833 or about 83.333%.
 Probability of Not Event A = 1 – P(A)
 Probability can range in from 0 to 1, where 0 means the
event to be an impossible one and 1 indicates a certain
event.
 The probability of all the events in a sample space
adds up to 1.
Basic Probability Concepts
Marginal Probability
Joint Probability
Conditional Probability
Probability Trees and Bayes’ Theorem
Problems and Solutions on Probability
Question 1: Find the probability of ‘getting
3 on rolling a die’.
Solution:
 Sample Space = S = {1, 2, 3, 4, 5, 6}
 Total number of outcomes = n(S) = 6
 Let A be the event of getting 3.
 Number of favorable outcomes = n(A) = 1
 i.e. A = {3}
 Probability, P(A) = n(A)/n(S) = 1/6
 Hence, P(getting 3 on rolling a die) = 1/6
Question 2: Draw a random card from a pack of cards.
What is the probability that the card drawn is a face
card?
Solution:
 A standard deck has 52 cards.
 Total number of outcomes = n(S) = 52
 Let E be the event of drawing a face card.
 Number of favorable events = n(E) = 4 x 3 = 12
(considered Jack, Queen and King only)
 Probability, P = Number of Favorable Outcomes/Total
Number of Outcomes
 P(E) = n(E)/n(S)
 = 12/52
 = 3/13
 P(the card drawn is a face card) = 3/13
Question 3: A vessel contains 4 blue balls, 5 red balls
and 11 white balls. If three balls are drawn from the
vessel at random, what is the probability that the first
ball is red, the second ball is blue, and the third ball is
white?
 Solution:
 The probability to get the first ball is red or the first event is
5/20.
 Since we have drawn a ball for the first event to occur, then
the number of possibilities left for the second event to occur is
20 – 1 = 19.
 Hence, the probability of getting the second ball as blue or the
second event is 4/19.
 Again with the first and second event occurring, the number of
possibilities left for the third event to occur is 19 – 1 = 18.
 And the probability of the third ball is white or the third event
is 11/18.
 Therefore, the probability is 5/20 x 4/19 x 11/18 = 44/1368 =
0.032.
 Or we can express it as: P = 3.2%.
Question 4: Two dice are rolled, find the probability that the sum
is:
1. equal to 1
2. less than 13

Solution:
 To find the probability that the sum is equal to 1 we have to first determine
the sample space S of two dice as shown below.
 S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }
 So, n(S) = 36
 1) Let E be the event “sum equal to 1”. Since, there are no outcomes which
where a sum is equal to 1, hence, P(E) = n(E) / n(S) = 0 / 36 = 0
 2) Let B be the event of getting the sum of numbers on dice is less than 13.
 From the sample space, we can see all possible outcomes for the event B,
which gives a sum less than B. Like: (1,1) or (1,6) or (2,6) or (6,6). So you
can see the limit of an event to occur is when both dies have number 6, i.e.
(6,6). Thus, n(B) = 36
 Hence, P(B) = n(B) / n(S) = 36 / 36 = 1
Equally Likely Events
 When the events have the same theoretical probability of
happening, then they are called equally likely events.

 The results of a sample space are called equally likely if all

of them have the same probability of occurring.
 Getting 3 and 5 on throwing a die
 Getting an even number and an odd number on a die
 Getting 1, 2 or 3 on rolling a die
Complementary Events
 The possibility that there will be only two outcomes which
states that an event will occur or not.
 Basically, the complement of an event occurring in the exact
opposite that the probability of it is not occurring. Some
more examples are:
• It will rain or not rain today
• The student will pass the exam or not pass.
• You win the lottery or you don’t.
Independent Events
Independent events are those events whose occurrence is not
dependent on any other event. For example, if we flip a coin in the
air and get the outcome as Head, then again if we flip the coin but
this time we get the outcome as Tail. In both cases, the occurrence
of both events is independent of each other.
 If the probability of occurrence of an event A is not affected by the
occurrence of another event B, then A and B are said to be
independent events.
 Consider an example of rolling a die.
 If A is the event ‘the number appearing is odd’ and B be the event
‘the number appearing is a multiple of 3’, then
 P(A)= 3/6 = 1/2 and P(B) = 2/6 = 1/3
 Also A and B is the event ‘the number appearing is odd and a
multiple of 3’ so that
 P(A ∩ B) = 1/6
 P(A│B) = P(A ∩ B)/ P(B) = (1/6) / (1/3) = 1/2
 P(A) = P(A│B) = 1/2 , which implies that the occurrence of event B
has not affected the probability of occurrence of the event A .
 If A and B are independent events, then P(A│B) = P(A)
 Using Multiplication rule of probability, P(A ∩ B) = P(B) .P(A│B)

Mutually Exclusive Events
Two events are said to be mutually exclusive if they
cannot occur at the same time or simultaneously.

 They are also called disjoint events.

 If two events are considered disjoint events, then the probability
of both events occurring at the same time will be zero.
 If the events A and B are not mutually exclusive, the probability
of getting A or B that is P (A ∪ B) formula is given as follows:
 P (A ∪ B) = P(A) + P(B) – P (A and B)
 Here P (A and B) means P(A ∩ B) is zero
 When tossing a coin, the event of getting head and tail are
mutually exclusive.
 In a six-sided die, the events “2” and “5” are mutually exclusive.
Marginal Probability
 The probability of an event occurring (p(A)), it may be
thought of as an unconditional probability.
 It is not conditioned on another event.
 Example: the probability that a card drawn is red
(p(red) = 0.5).
 Another example: the probability that a card drawn is 4
(p(four)=1/13).
Marginal Probability
Joint Probability
 It is the probability of two different event A and event
B occurring at the same time.
 It is the probability of the intersection of two or more
events.
 The probability of the intersection of A and B may be
written p(A ∩ B) or p(A and B).
 Example: the probability that a card is a four and red
=
 p(four and red) = 2/52=1/26.
 (There are two red fours in a deck of 52, the 4 of
hearts and the 4 of diamonds).
Joint Probability
Question 1 At an e-commerce customer service centre a
total of 112 complaints were received. 78 customers
complained about late delivery of the items and 40
complained about poor product quality.
(a) Calculate the probability that a customer complaint will be
about both late delivery and product quality.
(b) What is the probability that a complaint is only about poor
quality of the product?
Solution
Let A = Late delivery and B = Poor quality of the product. Let
n(A) and n(B) be the number of cases in favour of A and B. So
n(A) = 78 and n(B) = 40. Since the total number of
complaints is 112 (here complaints is treated as the sample
space), hence
n(A ∩ B) = 118 – 112 = 6
Joint Probability
Conditional Probability
 If A and B are events in a sample space, then the
conditional probability of the event B given that the event A
has already occurred, denoted by P(B|A), is defined as:

 The conditional probability symbol P(B|A) is read as the

probability of B given A. It is necessary to satisfy the
condition that P(A) > 0, because it does not make sense to
consider the probability of B given that event A is
impossible.
 the conditional probability of default given divorced is
P(Default|Divorced) = 0.013/0.05 = 0.26 and
similarly probability of default given single is
P(Default|Single) = 0.042/0.3 = 0.14
APPLICATION OF SIMPLE PROBABILITY
RULES – ASSOCIATION RULE LEARNING
 We can use simple probability concepts such as joint probability and
conditional probability to solve analytics problems such as market
basket analysis and recommender systems using algorithms such as
Association Rule Learning (aka Association Rule Mining).
 Association rule mining is one of the popular algorithms used to
solve problems such as market basket analysis and recommender
systems.
 Market basket analysis (MBA) is used frequently by retailers to
predict products a customer is likely to buy together, which further
can be used for designing planogram and product promotions. The
primary objective of MBA is to find probability of buying two
products (A and B) together.
 Recommender systems are models that produce list of
recommendations to a customer on products such as books, movies,
news items, etc. and is an important analytics technique.
Association Rule Learning
 In general, association rule learning (also known as
association rule mining) is a method of finding association
between different entities in a database.
 In a retail context, association rule learning is a method for
finding association relationships that exist in frequently
purchased items.
 Association rule is a relationship of the form X → Y (that is,
X implies Y). Here, X and Y are two mutually exclusive sets
(set 3.2,
In Table of stock keeping
transaction ID isunits or SKUs).
the transaction reference
number and apple, orange, etc.
are the different SKUs sold by
the store. Binary code is used
to represent whether the SKU
was purchased (equal to 1) or
not (equal to 0) during a
transaction. The strength of
association between two
mutually exclusive subsets can
be measured using ‘support’,
Association Rule Learning
 Support between two sets (of products purchased) is
calculated using the joint probability of those events:

 where n(X ∩ Y) is the number of times both X and Y is

purchased together and N is the total number of transactions.
That is, support is proportion of times X and Y are purchased
together.
 Confidence is the conditional probability of purchasing
product Y given the product X is purchased.

 The third measure in association rule mining is lift, which is

given by

 Lift overcomes one of the disadvantages of using confidence.

For example, P(X) could be very small, making it less
attractive for MBA and recommendation among millions of
Association Rule Learning
 In Table 3.2, assume that X = Apple and Y = Banana. Then

 Association rules can be generated based on threshold

values of support, confidence and lift. For example, assume
that the cut-off for support is 0.25 and confidence is 0.5 (lift
should be greater than 1). Then we can conclude that X
implies Y (that is, purchase of apple implies purchase of
banana, however this rule will be ineffective since lift is
less than 1).
Bayes’ Theorem
 It describes the probability of an event, based on prior
knowledge of conditions that might be related to that
event.
 It can also be considered for conditional probability
examples.
 It is used where the probability of occurrence of a
particular event is calculated based on other conditions
which are also called conditional probability.
 For example: There are 3 bags, each containing some
white marbles and some black marbles in each bag. If a
white marble is drawn at random. With probability to find
that this white marble is from the first bag. In cases like
such, we use the Bayes’ Theorem.
Bayes’ Theorem
 Bayes’ theorem is one of the most important concepts in analytics since several
problems are solved using Bayesian statistics. Consider two events A and B. We can
write the following two conditional probabilities:

 Using the two equations, we can show that

 Equation (3.13) is the Bayes’ theorem. Bayes’ theorem helps the data scientists to
update the probability of an event (B) when any additional information is provided.
This makes Bayesian statistics a very attractive technique since it helps the
decision maker to fine-tune his/her belief with every additional data that is
received.
The following terminologies are used to describe various components in Eq.
(3.13).
 P(B) is called the prior probability (estimate of the probability without any additional
information).
 P(B|A) is called the posterior probability (that is, given that the event A has
occurred, what is the probability of occurrence of event B). That is, post the
additional information (or additional evidence) that A has occurred, what is
estimated probability of occurrence of B.
 P(A|B) is called the likelihood of observing evidence A if B is true.
Generalization of Bayes’ Theorem
 The probability of evidence P(A) may come from mutually
exclusive subsets (events) as described in Figure 3.2.

 For better understating, consider a part manufactured by

different suppliers B1 , B2 , ..., Bn . Let A denote a
defective part. P(A) can be written as:
Bayes’ Theorem-example
There are three urns containing 3 white and 2 black balls; 2
white and 3 black balls; 1 black and 4 white balls respectively.
There is an equal probability of each urn being chosen. One
ball is equal probability chosen at random. what is the
probability that a white ball is drawn?

 Let E1, E2, and E3 be the events of choosing the first,

second, and third urn respectively. Then,
 P(E1) = P(E2) = P(E3) =1/3
 Let E be the event that a white ball is drawn. Then,
 P(E/E1) = 3/5, P(E/E2) = 2/5, P(E/E3) = 4/5
 By theorem of total probability, we have
 P(E) = P(E/E1) . P(E1) + P(E/E2) . P(E2) + P(E/E3) . P(E3)
 = (3/5 * 1/3) + (2/5 * 1/3) + (4/5 * 1/3)
 = 9/15 = 3/5
Bayes’ Theorem- example
At an electronics plant, it is known from past
experience that the probability is 0.83 that a new
worker who has attended the company’s training
program will meet the production quota and that
the corresponding probability is 0.35 for a new
worker who has not attended the company’s
training program. If 80 % of all new workers
attend the training program, what is the
probability that a new worker will meet the
production quota? Also, find the probability that a
new worker who meets the production quota will
have attended the company’s training program.
Bayes’ Theorem- example
60% of the companies that increased their share
price by more than 5% in the last three years
replaced their CEOs during the period.
At the same time, only 35% of the companies that
did not increase their share price by more than
5% in the same period replaced their CEOs.
Knowing that the probability that the stock prices
grow by more than 5% is 4%, find the probability
that the shares of a company that fires its CEO
will increase by more than 5%.
Bayes’ Theorem- example
Before finding the probabilities, you must first
define the notation of the probabilities.
• P(A) – the probability that the stock price
increases by 5%
• P(B) – the probability that the CEO is replaced
• P(A|B) – the probability of the stock price
increases by 5% given that the CEO has been
replaced
• P(B|A) – the probability of the CEO replacement
given the stock price has increased by 5%.
Using the Bayes’ theorem, we can find the
required probability:

Thus, the probability that the shares of a company

that replaces its CEO will grow by more than 5% is
6.67%.
Random Variable
 A variable is defined as any symbol that can take any
particular set of values.
 If the value of a variable depends upon the outcome of
a random experiment, it is a random variable and can take
up any real value.
 Such an experiment, where we know the set of all possible
results but find its impossible to predict one at any
particular execution, is a random experiment.
 Mathematically, a random variable is a real-valued function
whose domain is a sample space S of a random experiment.
 Random variable is always denoted by capital letter like
X,Y,M.
 Lowercase letters like x, y, z, m etc. represent its value.
Random Variable
 X denotes the Probability Distribution of random variable
X.
 P(X) denotes the Probability of X.
 p(X=x) denotes the Probability that random variable X is
equivalent to any particular value, represented by x.
 Experiment is tossing a coin 2 times.
 Sample space(S) is {HH, TH, HT, TT}.
 X(Random Variable) is the number of both heads when we
toss a coin 2 times.
 P(X=HT)=0.25, P(X=TT)=0.25, P(X=HH)=0.25,
P(X=HT)=0.25
 For outcome {HT},
 Then, X(HH) = 0, X(TH) = 0, X(HT) = 1, X(TT) = 0.
Random Variable
Since there are two forms of data, discrete and
continuous, there are two types of random
variables.
It can be categorized into two types:
 Discrete Random Variable
 Continuous Random variable
Discrete random variable
 If the random variable X can assume only a finite or countably
infinite set of values, then it is called a discrete random variable.
There are very many situations where the random variable X can
assume only finite or countably infinite set of values.

Examples of discrete random variables are:

 Credit rating (usually classified into different categories such as
low, medium and high or using labels such as AAA, AA, A, BBB,
etc.).
 Number of orders received at an e-commerce retailer which can
be countably infinite.
 Customer churn [the random variables take binary values: (a)
Churn and (b) Do not churn].
 Fraud [the random variables take binary values: (a) Fraudulent
transaction and (b) Genuine transaction].
 Any experiment that involves counting (for example, number of
returns in a day from customers of e-commerce portals such as
Amazon, Flipkart; number of customers not accepting job offers
Continuous random variable
 A random variable X which can take a value from an infinite set
of values is called a continuous random variable.
Examples of continuous random variables are listed below:
 Market share of a company (which take any value from an
infinite set of values between 0 and 100%).
 Percentage of attrition among employees of an organization.
 Time to failure of engineering systems.
 Time taken to complete an order placed at an e-commerce
portal.
 Time taken to resolve a customer complaint at call and service
centers.
 Height, Weight, Amount of rainfall, etc.

In many situations, a continuous variable may be converted to a

discrete random variable for modelling purpose.
Probability Distributions
 A probability distribution is a function that calculates the
likelihood of all possible values for a random variable.
 For any event of a random experiment, we can find its
corresponding probability.
 For different values of the random variable, we can find
its respective probability.
 The values of random variables along with the
corresponding probabilities are the probability
distribution of the random variable.
 A probability distribution and probability mass functions
can both be used to define a discrete probability distribution.
 A continuous probability distribution is described using a
probability distribution function and a probability density
function.
Discrete Random Variables
 The probability distribution of a discrete random variable is
a list of probabilities associated with each of its possible
values.
 It is also sometimes called the probability function or the
probability mass function.
 More formally, the probability distribution of a discrete
random variable X is a function which gives the probability
p(xi) that the random variable equals xi, for each value xi:
 p(xi) = P(X=xi)
 It satisfies the following conditions:
• 0 <= p(xi) <= 1
• sum of all p(xi) is 1
Discrete Random Variables
 Consider a random variable X= number of heads after tossing
a coin thrice.
 x ∈ {0,1,2,3}.
 All the possible outcomes after a coin is flipped thrice are,
{HHH,HHT,HTT,TTT,TTH,THH,THT,HTH}.
 What will be the probability that 0 heads occur?
 We denote it as P(X=0)=1/8=0.125
 probability of getting exactly 1 head=P(X=1)=3/8=0.375
 P(X=2)=3/8=0.375
 P(x=3)=1/8=0.125
 If we sum up the probabilities of all outcomes, it will be equal
to one. This gives us the Probability Distribution of that
random variable.
Probability Mass Function(PMF) of Discrete
Random Variable
 In the case of Discrete Random Variables, the function that
denotes the probability of the random variable for each x in
the range of X is known as the Probability Mass
Function(PMF).
 It can be shown using tables or graph or mathematical
equation.
Probability Distribution Function(PDF)
of Discrete Random Variable
 Note that the values of x take on all possible cases. and the sum
of the probabilities add to 1. mathematically, this can be written
as f(x) = p(x = x). the set of ordered pairs (x, f(x)) is called
the probability function, probability mass
function or probability distribution function of the discrete
random variable x. f(x) is considered a probability mass function
if it satisfies the following conditions:
 In case of rolling of a die, the probability of each value X can
take is the same. So the probability distribution in this case will
be:
P(X=1) = 1/6, P(X=2) = 1/6 and so on.

 Note that the values of x take on all possible cases. And the sum
of the probabilities add to 1.
Probability Distribution Function(PDF)
of Discrete Random Variable
 Mathematically, this can be written as f(x) = P(X = x).
 The set of ordered pairs (x, f(x)) is called the probability
function, probability mass function or probability
distribution function of the discrete random variable X.
 f(x) is considered a probability mass function if it satisfies
the following conditions:

Example
Cumulative distribution Function(CDF) of
Discrete Random Variable
 However, many times we may wish to compute the
probability that the random variable X be lesser than or
equal to some real number x.
 Writing F(x) = P(X ≤ x) for every real number x, we define
F(x) to be the cumulative distribution function of the
random variable X.

Example 1

Cumulative distribution function, F(xi ), is the probability that the

random variable X takes values less than or equal xi . That is, F(xi
) = P(X ≤ xi ).
F(2) = P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.60
Example 2
The probability of X is less than or 1 is 0.1.
Similarly, probability of X is less than or equal to 2 is (0.1+0.3)
=0.4 and so on.
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
 The probability of a continuous random variable assuming
exactly any of its values is 0.
 Hence, the probability distribution for a continuous random
variable cannot be given in tabular form.
 The probability density function of a continuous random variable
is a function which can be integrated to obtain the probability
that the random variable takes a value in a given interval.
 The probability for a continuous random variable is always
computed at intervals : P(a≤X≤b).

 The probability distribution of a continuous random variable can be

stated as a formula; and f(x) is called the probability density function,
or simply a density function, of X.
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
 Here probability is given by the surface area under the
curve with a interval. To find the probability of a certain
interval, say a to b, we find the area under that curve by
integrating the PDF in that interval.
 fₓ(x) is the Probability Density Function

 And the cumulative distribution function F(x) of a

continuous random variable is given by:
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
 Probability density function and cumulative distribution
function of a continuous random variable satisfy the
following properties:

 The expected value of a continuous random variable, E(X),

is given by

 The variance of a continuous random variable, Var(X), is

given by
Probability Distribution
Summary
Depending on type of random variables, its
probability distribution can be categorized into :
1. Discrete probability distributions
2. Continuous probability distributions
Bernoulli Distribution
 This distribution is generated when we perform an experiment
once and it has only two possible outcomes – success and
failure.
 The trials of this type are called Bernoulli trials, which form
the basis for many distributions
 Let p be the probability of success and 1 – p is the probability
of failure.
 The PMF is given as

Examples:
 flipping a coin once. p is the probability of getting a head and
1 – p is the probability of getting a tail.
 Will you pass or fail a test?
 Will your favourite sports team win or lose their next match?
 Will you be accepted or rejected for that job you applied for?
Binomial Distribution
Binomial distribution is one of the most important discrete
probability distribution due to its applications in several contexts.
A random variable X is said to follow a Binomial distribution when

1. The random variable can have only two outcomes success and failure
(also known as Bernoulli trials).
2. The objective is to find the probability of getting k successes out of n
trials.
3. The probability of success is p and thus the probability of failure is (1 −
p).
4. The probability p is constant and does not change between trials .

Success and failure are generic terminologies used in binomial

distribution; based on the context, the interpretation will change (winning
a lottery can be considered as success and not winning as failure).
Binomial Distribution
In analytics, the following are few example problems that can
be associated with Binomial distribution:

 Customer churn where the outcomes are: (a) Customer churn and (b) No
customer churn.
 Fraudulent insurance claims where the outcomes are: (a) Fraudulent
claim and (b) Genuine claim.
 Loan repayment default by a customer where the outcomes are: (a)
Default and (b) No default.
 Cart abandonment in e-commerce (a situation where the customer adds
items to his/her cart but does not make the purchase), where the
outcomes are: (a) Cart abandonment and (b) No cart abandonment.
 Employee attrition at a company where the outcomes are: (a) The
employee leaves (exits) the company and (b) The employee does not
leave the company.

Any business context in which there are only two outcomes can be
analysed using Binomial distribution
Binomial Distribution
Example:
 Flipping a coin n number of times and calculating the
probabilities of getting a particular number of heads.
 More real-world examples include the number of successful
sales calls for a company or whether a drug works for a
disease or not.
 Number of winning lottery tickets when you buy 10 tickets
of the same kind
 Number of left-handers in a randomly selected sample of
100 unrelated people
Binomial Distribution
Binomial Distribution
 For example, suppose we shuffle a standard deck of cards,
and we turn over the top card. We put the card back in the
deck and reshuffle. We repeat this process five times. Let X
equal the number of Jacks we observe. Is this a binomial
distribution?
 B – binary – yes, either it’s a Jack or it isn’t
 I – independent – yes, because we replace the card each
time, the trials are independent.
 N – number of trials fixed in advance – yes, we are told to
repeat the process five times.
 S – successes (probability of success) are the same – yes, the
likelihood of getting a Jack is 4 out of 52 each time you turn
over a card.
 Therefore, this is an example of a binomial distribution.
 Suppose that Charlie makes a free throw has probability of
0.82 on any one try. Assuming that this probability doesn’t
change, find the chance that Charlie makes 4 out of the next
seven free throws.
 let’s determine the number of free throws
Charlie should expect to make and the standard
deviation.
Example
Example
Poisson Distribution
It describes the events that occur in a fixed
interval of time or space.
 Examples:
Consider the case of the number of calls received by
a customer care center per hour. We can estimate the
average number of calls per hour but we cannot
determine the exact number and the exact time at
which there is a call. Each occurrence of an event is
independent of the other occurrences.
The PMF is given as,
Poisson Distribution
where λ is the average number of times the event
has occurred in a certain period of time,
x is the poisson random variable (with desired
outcome)
and e is the base of logarithm , Euler’s number,
and e = 2.71828 (approx).
Properties of Poisson
Distribution
The occurrence of the event are independent in an
interval.
An infinite number of occurrences of the of the event
are possible in the interval.
The probability of a single event in the interval is
proportional to the length of the event.
In an infinitely small portion of the interval, the
probability of more than one occurrence of the event is
negligible.
The Poisson distribution is limited when the number of
trials n is indefinitely large.
If the mean is large, then the Poisson distribution is
approximately a normal distribution.
Poisson Distribution
In Poisson distribution, the mean is represented
as μ = E(X) = λ.
The mean and the variance of Poisson
Distribution are equal. It means that E(X) = V(X)
Where,
V(X) is the variance.
The standard deviation is always equal to the
square root of the mean μ.
Applications of Poisson Distribution
• To count the number of defects of a finished
product
• To count the number of deaths in a country
by any disease or natural calamity
• To count the number of infected plants in the
field
• To count the number of bacteria in the
organisms or the radioactive decay in atoms
• To calculate the waiting time between the
events.
Poisson Distribution-Example
In a cafe, the customer arrives at a mean rate of 2 per
min. Find the probability of arrival of 5 customers in 1
minute using the Poisson distribution formula.
Solution:
Given: λ = 2, and x = 5.
Using the Poisson distribution formula:
P(X = x) = (e-λ λx )/x!
P(X = 5) = (e-2 25 )/5!
P(X = 5) = 0.036
Answer: The probability of arrival of 5 customers
per minute is 3.6%.
Poisson Distribution-Example
Find the mass probability of function at x = 6, if
the value of the mean is 3.4.

Solution:
Given: λ = 3.4, and x = 6.
Using the Poisson distribution formula:
P(X = x) = (e-λ λx )/x!
P(X = 6) = (e-3.4 3.46 )/6!
P(X = 6) = 0.072
Answer: The probability of function is 7.2%.
Poisson Distribution-Example
 If 3% of electronic units manufactured by a company are
defective. Find the probability that in a sample of 200 units,
less than 2 bulbs are defective.

 Solution:
 The probability of defective units p = 3/100 = 0.03
 Give n = 200.
 We observe that p is small and n is large here. Thus it is a Poisson
distribution.
 Mean λ= np = 200 × 0.03 = 6
 P(X= x) is given by the Poisson Distribution Formula as (e -λ λx )/x!
 P(X < 2) = P(X = 0) + P(X= 1)
 =(e-6 60 )/0! + (e-661 )/1!
 = e-6 + e-6 × 6
 = 0.00247 + 0.0148
 P(X < 2) = 0.01727
 Answer: The probability that less than 2 bulbs are defective
Continuous probability distributions
 These distributions model the probabilities of random
variables that can have any possible outcome, also real.
 Two continuous probability distribution function are
associated with such continuous random variables:
 Probability Density Function (PDF)
 Cumulative Density/Distribution Function(CDF)
 For example, the possible values for the random
variable X that represents weights of citizens in a town
which can have any value like 34.5, 47.7, etc.,
 Examples: Normal, Student’s T, Chi-square,
Exponential, etc.,
Probability Density function
Probability Density function (PDF) estimate
the probability that it lies within a particular
range of values for any given
outcome(continuous) for continuous
distributions.
Cumulative Distribution Function

The Cumulative Distribution Function of X,

evaluated at x is the probability that X will
take a value less than or equal to x.
Continuous probability distributions
When working with continuous random variables,
such as X, we only calculate the probability
that X lie within a certain interval;
like P(X≤k) or P(a≤X≤b).
We don't calculate the probability of X being
equal to a specific value k.
In fact the result, P(X=k)=0 , will always be true:
This can be explained by the fact that the total
number of possible values of a continuous random
variable X is infinite, so the likelihood of any one
single outcome tends towards 0.
Continuous probability distributions
The idea is to integrate the probability density
function f(x) to define a new function F(x),
known as the cumulative density function.
To calculate the probability that X be within a
certain range,
say a≤X≤b, we calculate F(b)−F(a), using
the cumulative density function.
Put "simply" we calculate probabilities as:
P(a≤X≤b)=
where f(x) is the variable's probability density
function.
Normal Distribution
It has two parameters namely mean and standard
deviation.
The mean has the highest probability and all other
values are distributed equally on either side of the
mean in a symmetric fashion.
The standard normal distribution is a special case
where the mean is 0 and the standard deviation of
1.
68% of the values are 1 standard deviation away,
95% percent of them are 2 standard deviations
away, 99.7% are 3 standard deviations away from
mean.
Normal Distribution
 The standard normal distribution is one of the forms of the
normal distribution. It occurs when a normal random variable
has a mean equal to zero and a standard deviation equal to one.
In other words, a normal distribution with a mean 0 and
standard deviation of 1 is called the standard normal
distribution. Also, the standard normal distribution is centred
at zero, and the standard deviation gives the degree to which a
given measurement deviates from the mean.

 A Z score represents how many standard deviations an

observation is away from the mean. The mean of the standard
normal distribution is 0. Z scores above the mean are positive
and Z scores below the mean are negative.

 Once you have computed a Z-score, you can look up the

probability in a table for the standard normal distribution
Standard Normal Distribution
 The random variable of a standard normal distribution is known as
the standard score or a z-score. It is possible to transform every
normal random variable X into a z score using the following
formula:
 z = (X – μ) / σ
 where X is a normal random variable, μ is the mean of X, and σ is
the standard deviation of X. You can also find the normal
distribution formula here. In probability theory, the normal or
Gaussian distribution is a very common continuous probability
distribution.
 Standardizing a normal distribution When you standardize a normal
distribution, the mean becomes 0 and the standard deviation
becomes
 This allows you to easily calculate the probability of certain
values occurring in your distribution, or to compare data sets
with different means and standard deviations.
Normal Distribution
The PDF is given by,

where μ is the mean of the random variable X and σ is the standard

deviation.
Example
Example
Example
Z table
Z table
Example
Example
Example2
Example2
Example2
Exponential Distribution
To predict the amount of waiting time until the
next event in a Poisson process (i.e., success,
failure, arrival, etc.).
For example, we want to predict the following:
• The amount of time until the customer finishes
browsing and actually purchases something in
your store (success).
• The amount of time until the hardware on AWS
EC2 fails (failure).
• The amount of time you need to wait until the bus
arrives (arrival).
Exponential Distribution

where λ is the rate parameter. λ = 1/(average time between

events) = 1/μ
and e=2.71828
The mean of the exponential distribution is 1/λ.
And the variance of the exponential distribution is 1/λ2.
Exponential Distribution
For example, suppose the mean number of minutes
between eruptions for a certain geyser is 40 minutes.
If a geyser just erupts, what is the probability that
we’ll have to wait less than 50 minutes for the next
eruption?

To solve this, we need to calculate rate parameter:

λ = 1/μ => λ = 1/40 => λ = .025
plug in λ = .025 and x = 50 to the formula for the
CDF:
P(X ≤ x) = 1 – e-λx => P(X ≤ 50) = 1 – e-.025(50)
P(X ≤ 50) = 0.7135
Exponential Distribution
 Assume that you usually get 2 phone calls per hour.
calculate the probability, that a phone call will come
within the next hour.
 Solution:
 It is given that, 2 phone calls per hour.
 So, it would expect that one phone call at every half-
an-hour.
 So, we can take λ = 0.5
 So, the computation is as follows:
 = 0.393469
 Therefore, the probability of arriving the phone calls
within the next hour is 0.393469

Probability
No ratings yet
Probability
64 pages
Probability New
No ratings yet
Probability New
878 pages
Probability Theory
No ratings yet
Probability Theory
48 pages
P S Chapter 1 &2
No ratings yet
P S Chapter 1 &2
32 pages
Mat 2a SR Prob m01 Intro (11 Jul 2016)
No ratings yet
Mat 2a SR Prob m01 Intro (11 Jul 2016)
856 pages
Probability
No ratings yet
Probability
33 pages
Unit 1 Review of Probability and Basic Statistics
100% (1)
Unit 1 Review of Probability and Basic Statistics
90 pages
Probability
No ratings yet
Probability
30 pages
Module 01 PPT Class Final 02-03-2023
No ratings yet
Module 01 PPT Class Final 02-03-2023
67 pages
340 Printable Course Notes
No ratings yet
340 Printable Course Notes
184 pages
DAP Unit 2 Notes
No ratings yet
DAP Unit 2 Notes
57 pages
Theory of Probability Part - Ii: S H, T S HH, HT, TH, TT
No ratings yet
Theory of Probability Part - Ii: S H, T S HH, HT, TH, TT
24 pages
Week 1: Learning Objectives: Introduction To Probability
No ratings yet
Week 1: Learning Objectives: Introduction To Probability
30 pages
Probability
No ratings yet
Probability
93 pages
Probability: By: Hina Gul Mba 2 Semester Dec. 1, 2009
No ratings yet
Probability: By: Hina Gul Mba 2 Semester Dec. 1, 2009
19 pages
Probability
No ratings yet
Probability
14 pages
MATH 14 - 2. Probability
No ratings yet
MATH 14 - 2. Probability
31 pages
Introduction to Probability Theory
No ratings yet
Introduction to Probability Theory
19 pages
Chapter # 4 Exhaustive Events
No ratings yet
Chapter # 4 Exhaustive Events
5 pages
Chapter 15 Notes
No ratings yet
Chapter 15 Notes
22 pages
Probability
No ratings yet
Probability
28 pages
PA Lec 2 2024
No ratings yet
PA Lec 2 2024
78 pages
Probability
No ratings yet
Probability
7 pages
Terminology (Ch3Sec1)
No ratings yet
Terminology (Ch3Sec1)
21 pages
MAT3003 Modules - (1 2 3) - Updated
No ratings yet
MAT3003 Modules - (1 2 3) - Updated
40 pages
Introduction To Probability
No ratings yet
Introduction To Probability
38 pages
Advance Mathematics Unit 3 N 4
No ratings yet
Advance Mathematics Unit 3 N 4
95 pages
Probability
No ratings yet
Probability
47 pages
Lec-1 PROBABILITY
No ratings yet
Lec-1 PROBABILITY
17 pages
DM - Chapter Two and Three
No ratings yet
DM - Chapter Two and Three
59 pages
Unit 5 & 6. Probability and Prob Disti
No ratings yet
Unit 5 & 6. Probability and Prob Disti
90 pages
Chapter 3-171
No ratings yet
Chapter 3-171
53 pages
Probability
No ratings yet
Probability
30 pages
Advanced Business Statistics For Decision Making: Facilitator-Dr. Shilpa Bhaskar Mujumdar
100% (1)
Advanced Business Statistics For Decision Making: Facilitator-Dr. Shilpa Bhaskar Mujumdar
31 pages
Statistics Notes
No ratings yet
Statistics Notes
107 pages
Probability Basics for Beginners
No ratings yet
Probability Basics for Beginners
52 pages
Important RGPV Question
No ratings yet
Important RGPV Question
23 pages
CPE412 Pattern Recognition (Week 3)
No ratings yet
CPE412 Pattern Recognition (Week 3)
44 pages
Unit 4
No ratings yet
Unit 4
21 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
Probability
No ratings yet
Probability
38 pages
Stat I Chapter 4
No ratings yet
Stat I Chapter 4
29 pages
ProbabilityStatistics Probability
No ratings yet
ProbabilityStatistics Probability
10 pages
Probability 120904030152 Phpapp01
No ratings yet
Probability 120904030152 Phpapp01
25 pages
AP ECON 2500 Session 3
No ratings yet
AP ECON 2500 Session 3
29 pages
Probability Basics for Students
No ratings yet
Probability Basics for Students
40 pages
Probability 01
No ratings yet
Probability 01
13 pages
Chapter 5
No ratings yet
Chapter 5
33 pages
Introduction To Probability Theory PDF
No ratings yet
Introduction To Probability Theory PDF
33 pages
Statif - 2 - Slides Probability I
No ratings yet
Statif - 2 - Slides Probability I
14 pages
Lecture 2 Review of Probabilty Theory
No ratings yet
Lecture 2 Review of Probabilty Theory
52 pages
Probability 1
No ratings yet
Probability 1
74 pages
M2. Elementary Probability Theory
No ratings yet
M2. Elementary Probability Theory
48 pages
Probability Theory I M Loeve PDF Download
100% (1)
Probability Theory I M Loeve PDF Download
82 pages
Beginning
No ratings yet
Beginning
11 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
5 pages
MAT 3103: Computational Statistics and Probability Chapter 3: Probability
No ratings yet
MAT 3103: Computational Statistics and Probability Chapter 3: Probability
23 pages
'' T P - T P - , .: Robability
No ratings yet
'' T P - T P - , .: Robability
21 pages
Intro to Elementary Probability
No ratings yet
Intro to Elementary Probability
25 pages
Chapter 7 Statistics
No ratings yet
Chapter 7 Statistics
7 pages
Marginal and Conditional Distributions
No ratings yet
Marginal and Conditional Distributions
10 pages
(Ebook) A First Look at Rigorous Probability Theory, Second Edition by Jeffrey S. Rosenthal ISBN 9789812772534, 9789812703712, 9789812703705, 9812703713, 9812703705, 9812772537 Download
No ratings yet
(Ebook) A First Look at Rigorous Probability Theory, Second Edition by Jeffrey S. Rosenthal ISBN 9789812772534, 9789812703712, 9789812703705, 9812703713, 9812703705, 9812772537 Download
60 pages
SP - Quarter 3 LAS 1
No ratings yet
SP - Quarter 3 LAS 1
7 pages
It - r22 Probability and Statistics - Digital Notes.
No ratings yet
It - r22 Probability and Statistics - Digital Notes.
122 pages
PS 3 - 2015
No ratings yet
PS 3 - 2015
2 pages
Chapter Three
No ratings yet
Chapter Three
12 pages
QBW M1V2Ch12 ENG Vw5Kosuc
No ratings yet
QBW M1V2Ch12 ENG Vw5Kosuc
21 pages
16EC206
No ratings yet
16EC206
2 pages
Probability
No ratings yet
Probability
3 pages
Actuarial Probability Exam Guide
No ratings yet
Actuarial Probability Exam Guide
5 pages
Assignment 3 (Measurable Functions)
No ratings yet
Assignment 3 (Measurable Functions)
2 pages
ES209 Introduction
No ratings yet
ES209 Introduction
17 pages
(Z) P (Z: For Negative Values of Z Use (-Z) 0
No ratings yet
(Z) P (Z: For Negative Values of Z Use (-Z) 0
2 pages
Probability and Random Variables: Department of Mathematics Ma8391 - Probability and Statistics Unit - I
No ratings yet
Probability and Random Variables: Department of Mathematics Ma8391 - Probability and Statistics Unit - I
6 pages
M3 - FDS
No ratings yet
M3 - FDS
38 pages
Chpter 2 of Wireless Communication
No ratings yet
Chpter 2 of Wireless Communication
50 pages
Measure PDF
No ratings yet
Measure PDF
15 pages
1) and R 1)
No ratings yet
1) and R 1)
12 pages
3 RD
No ratings yet
3 RD
89 pages
MA225 L3 Notes
No ratings yet
MA225 L3 Notes
40 pages
Random Variable and Its Distribution Problems: NPTEL-Probability and Distributions
No ratings yet
Random Variable and Its Distribution Problems: NPTEL-Probability and Distributions
7 pages
MAT204 - M3 Ktunotes - in
No ratings yet
MAT204 - M3 Ktunotes - in
26 pages
Assignment 9
No ratings yet
Assignment 9
2 pages
Lebesgue Integral and Convergence
No ratings yet
Lebesgue Integral and Convergence
30 pages
Discrete Random Variables s1
No ratings yet
Discrete Random Variables s1
29 pages
Monty Hall & Binomial Distribution
No ratings yet
Monty Hall & Binomial Distribution
37 pages
Mean, Variance and Standard Deviations
No ratings yet
Mean, Variance and Standard Deviations
23 pages
Lecture 10
No ratings yet
Lecture 10
19 pages
Random Processes: Professor Ke-Sheng Cheng
No ratings yet
Random Processes: Professor Ke-Sheng Cheng
23 pages

Unit 3 Introduction To Probability

Uploaded by

Unit 3 Introduction To Probability

Uploaded by

Introduction to Probability

 The Venn diagram in Figure indicates that the event E is a

 For example, say a company has 1000 employees and

1. The probability of event E always lies between 0 and

 The results of a sample space are called equally likely if all

 They are also called disjoint events.

 The conditional probability symbol P(B|A) is read as the

 where n(X ∩ Y) is the number of times both X and Y is

 The third measure in association rule mining is lift, which is

 Lift overcomes one of the disadvantages of using confidence.

 Association rules can be generated based on threshold

 Using the two equations, we can show that

 For better understating, consider a part manufactured by

 Let E1, E2, and E3 be the events of choosing the first,

Thus, the probability that the shares of a company

Examples of discrete random variables are:

In many situations, a continuous variable may be converted to a

Cumulative distribution function, F(xi ), is the probability that the

 The probability distribution of a continuous random variable can be

 And the cumulative distribution function F(x) of a

 The expected value of a continuous random variable, E(X),

 The variance of a continuous random variable, Var(X), is

Success and failure are generic terminologies used in binomial

The Cumulative Distribution Function of X,

 A Z score represents how many standard deviations an

 Once you have computed a Z-score, you can look up the

where μ is the mean of the random variable X and σ is the standard

where λ is the rate parameter. λ = 1/(average time between

To solve this, we need to calculate rate parameter:

You might also like