Module -2 : Statistical Distribution
Why statistical distribution
for data science?
Backbone of data science
Normal, binomial, Poisson, exponential, uniform, and beta
distributions
Data scientists can uncover patterns
Unlock the full potential of data and drive informed decision-
making
Why statistical distribution
for data science?
Data Scientists deal with many kinds of data, such as categorical,
numerical, text, image, voice, and many more.
1.Discrete — It can only take specific values. The outcome of the data
is fixed. For example, the number of employees in a company, the
result when you roll a die where a possible outcome can be between
[1,6]
2.Continuous — It can take any values. For example, the height or
weight of a person can be any values like 45.6, 87.9
Probability Distributions
1.Bernoulli
2.Binomial
3.Uniform
4.Poisson
5.Normal
Flow Chart
From discrete random variables, it is possible to
calculate Probability Mass Functions,
while from continuous random variables can be
derived Probability Density Functions.
The Bernoulli Distribution
Also called the binary distribution/discrete distribution.
Two outcomes- The probability of success is denoted by p.
The probability of failure is therefore 1 − p. Such a trial is called a Bernoulli trial
The Bernoulli Distribution
Two outcomes- The probability of success is denoted by p.
1.Only one Bernoulli trial is performed.
2.There are only two possible outcomes for the single Bernoulli trial.
The random variable X is said to have the Bernoulli distribution with parameter p. The notation
is X ∼ Bernoulli(p).
The Bernoulli Distribution
Mean and Variance of a Bernoulli Random Variable
Let’s assume that x = 1 when we have a “success”, and that x = 0 when we have a “failure”. Then we have:
Data science perspective
Application
Bernoulli trials set the stage for the Binomial
Distribution, which in turn sets the foundation of
any binary classification model.
Example
The Binomial Distribution
Bernoulli trial is always a single trial.
What if we do multiple Bernoulli trials as part of a single experiment?
Like, ’n’ number of times? As an example from field of epidemiology, don’t you think that testing a random set of
1000 subjects for COVID-19 sounds something very similar?
If a Bernoulli trial is repeated n number of times, then the experiment as a whole is known as a Binomial
experiment.
The Binomial Distribution
Sampling a single component from a lot and determining whether it is defective is an example of
a Bernoulli trial.
The number of successes is then a random variable, which is said to have a binomial distribution.
The Binomial Distribution
Code
Binomial
Binomial
Example
Example
Example
Poisson Distribution- Case study
Jenny wants to make sure every customer has a minimal wait time and there’s always someone to
help them, so the customer experience is the best they can provide. But, at times, that hasn’t been
the case. Jenny has learned the hard way that when there’s more than 10 customers at the store,
there’s not have enough staff to help them and some customers end up leaving frustrated with the
long wait and lack of assistance.
You’re a Data Scientist- How you will help her?
Ultimately, Jenny wants you to help her figure out how many customers she should expect
at her shop in any given hour.
Jenny really wants to know, how likely is it that 10 customers will be at the shop at the same time, in any given
hour.
Poisson Distribution
Suppose there is a bakery on the corner of the street and on average 10 customers arrive at the
bakery per hour. For this case, we can calculate the probabilities of different numbers of
customers arriving at the bakery at any hour using the Poisson distribution
Poisson Distribution- Case study
Random variable that is Customer arriving at Jenny’s ice cream shop- Binomial Distribution.
Approximately 5 customers per hour enter Jenny’s shop, i.e., one customer entering every 12
minutes.
Poisson Distribution
Suppose there is a bakery on the corner of the street and on average 10 customers arrive at the
bakery per hour. For this case, we can calculate the probabilities of different numbers of
customers arriving at the bakery at any hour using the Poisson distribution
Poisson Distribution
Poisson distribution deals with the frequency with which an event occurs within a specific
interval.
A Poisson process is represented with the notation Po(λ), where λ represents the expected number
of events that can take place in a period. The expected value and variance of a Poisson process is
λ. X represents the discrete random variable.
The main characteristics which describe the Poisson Processes are:
•The events are independent of each other. An event can occur any number of times (within the
defined period). Two events can’t take place simultaneously.
Poisson Distribution
Poisson Distribution
Python code for Poisson
distribution
Poisson Distribution
❖ In Machine Learning, the Poisson distribution is used in probabilistic models. For example, in a Generalized
Linear Model you can use the Poisson distribution to model the distribution of the target variable.
❖ Extreme weather events[2] or the cascades of Twitter messages and Wikipedia revision history
Poisson Distribution
Poisson Distribution- Solved example
Poisson Distribution- Solved example
Poisson Distribution- Solved example
Poisson Distribution- Solved example
(a)
Poisson Distribution- Solved example
Q. If X ∼ Poisson(3), compute P(X = 2), P(X = 10), P(X = 0), P(X = −1), and P(X =
0.5).
with λ = 3, we obtain
Poisson Distribution- Solved example
Q. If X ∼ Poisson(4), compute P(X ≤ 2) and P(X > 1).
with λ = 4, we obtain
Poisson Distribution- Solved example
Q. Assume that the number of cars that pass through a certain intersection during a fixed time
interval follows a Poisson distribution. Assume that the mean rate is five cars per minute. Find
the probability that exactly 17 cars will pass through the intersection in the next three minutes.
The Mean and Variance of a Poisson Random Variable
CODE-Bernoulli distribution in the case of a biased coin
CODE-Uniform distribution in the case of a biased coin
CODE-Binomial distribution in the case of a biased coin
CODE-Normal distribution in the case of a biased coin
Student t-Test Distribution
Small sample size approximation of a normal distribution
The student’s t-distribution, also known as the t distribution, is a type of statistical distribution similar to the
normal distribution with its bell shape but has heavier tails. The t distribution is used instead of the normal
distribution when you have small sample sizes.
Student t-Test Distribution
For example, suppose we deal with the total apples sold by a shopkeeper in a month. In that case,
we will use the normal distribution. Whereas, if we are dealing with the total amount of apples
sold in a day, i.e., a smaller sample, we can use the t distribution.
The Normal Distribution
The Normal Distribution- Solved Example
Q1. Calculate the probability of normal distribution with the population mean 2, standard deviation 3 or random
variable 5.
The Normal Distribution- Solved Example
Q2. If the value of the random variable is 4, the mean is 4 and the standard deviation is 3, then find the probability
density function of the Gaussian distribution.
The Normal Distribution- Example
The distribution of the heights of human beings. The average height is found to be roughly 175 cm (5' 9"),
counting both males and females.
Financial phenomena—such as expected stock-market returns—
do not fall neatly within a normal distribution
Properties of Normal Distribution
For the normal distribution of data, the mean, median, and
mode are equal.(i.e., Mean = Median = Mode).
Total area under the normal distribution curve is equal to 1.
Normally distributed curve is symmetric at the center along the
mean.
In a normally distributed curve, there is exactly half value to
the right of the central and exactly half value to the right side
of the central value.
Normal distribution is defined using the values of the mean
and standard deviation.
Normal distribution curve is a Unimodal Curve, i.e. a curve with
only one peak.
Z Score
The number z is sometimes called the “z-score” of x. The z-score is
an item sampled from a normal population with mean 0 and
standard deviation 1. This normal population is called the standard
normal population.
Q1. Ball bearings manufactured for a certain z-score (also called a standard score)
Q2. The diameter of a gives you an idea of how far from
application have diameters (in mm) that are
certain ball bearing has a z- the mean a data point is.
normally distributed with mean 5 and
standard deviation 0.08. A particular ball
score of −1.5. Find the
bearing has a diameter of 5.06 mm. Find the diameter in the original
z-score. units of mm.
Z Score
Q3.
Michael scored 86 on the exam. Find the z-score for Michael's
exam grade.
Application
Models like LDA
Gaussian Naive Bayes
Logistic Regression
Linear Regression
Also, Sigmoid functions work most naturally with normally distributed data.
Estimating the Parameters of a Normal Distribution
μ and σ2 of a normal distribution represent its mean and variance.
X1, …, Xn are a random sample from a N(μ, σ2) distribution
μ is estimated with the sample mean and σ2 is estimated with the sample variance s2
As with any sample mean,
Apply probability distributions in real data(Boston data set)
In titanic dataset we want to know What is the probability that a person on the ship
will survive?
https://www.kaggle.com/code/abdelrhmaneltawagny/apply-
probability-distributions-in-real-data
Binomial
The Lognormal Distribution
For data that are highly skewed or that contain outliers, the normal distribution is generally not
appropriate.
The lognormal distribution, which is related to the normal distribution, is often a good choice for
these data sets.
A log-normal distribution is a probability distribution of a random variable whose logarithm is
normally distributed
Uniform Distribution
The probability density function of the continuous uniform distribution with parameters a and
b is
If X is a random variable with probability density function f (x), we say that X is uniformly
distributed on the interval (a, b).
Since the probability density function is constant on the interval (a, b), we can think of the
probability as being distributed “uniformly” on the interval
Uniform Distribution
The probability density function of the continuous uniform distribution with parameters a and
b is
Example: When a motorist stops at a red light at a certain intersection, the waiting time for the light to
turn green, in seconds, is uniformly distributed on the interval (0, 30). Find the probability that
the waiting time is between 10 and 15 seconds.
Mean, Variance
Question 1: If X is uniformly distributed in (-1 , 4) then
Mean= Median= (a + b)/2 (i) its mean is ______________.
(ii) its variance is ______________.
Variance (σ2 )= (b – a)2 /12 (iii) its standard deviation is ___________.
(iv) its median is ______________.
Visualization of Uniform Distribution
Distributions at a Glance
Some Principles of Point Estimation
A quantity calculated from data is called a statistic, and a statistic that is used to estimate an unknown
constant or parameter, is called a point estimator or point estimate.
The quantity most often used to evaluate the overall goodness of an estimator is the mean squared
error (abbreviated MSE), which combines both bias and uncertainty
Maximum Likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of
an assumed probability distribution, given some observed data.
For a linear model we can write this as y = mx + c.
❖ Maximum likelihood estimation is a method that determines values for the parameters of a
model.
❖ The parameter values are found such that they maximise the likelihood that the process
described by the model produced the data that were actually observed.
Maximum Likelihood
When we perform MLE, we are trying to find the distribution that best fits our data. The
resulting value of the distribution’s parameter is called the maximum likelihood estimate.
Likelihood Function
Maximum Likelihood
Module 2- Statistical Modelling for Data Science ( 20CSE743)