0% found this document useful (0 votes)

8 views67 pages

Module 2

Uploaded by

Kent Wells

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views67 pages

Module 2

Uploaded by

Kent Wells

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Module -2 : Statistical Distribution

Why statistical distribution

for data science?
Backbone of data science
Normal, binomial, Poisson, exponential, uniform, and beta
distributions
Data scientists can uncover patterns
Unlock the full potential of data and drive informed decision-
making
Why statistical distribution
for data science?
Data Scientists deal with many kinds of data, such as categorical,
numerical, text, image, voice, and many more.

1.Discrete — It can only take specific values. The outcome of the data
is fixed. For example, the number of employees in a company, the
result when you roll a die where a possible outcome can be between
[1,6]
2.Continuous — It can take any values. For example, the height or
weight of a person can be any values like 45.6, 87.9
Probability Distributions
1.Bernoulli
2.Binomial
3.Uniform
4.Poisson
5.Normal
Flow Chart
From discrete random variables, it is possible to
calculate Probability Mass Functions,

while from continuous random variables can be

derived Probability Density Functions.
The Bernoulli Distribution
Also called the binary distribution/discrete distribution.
Two outcomes- The probability of success is denoted by p.
The probability of failure is therefore 1 − p. Such a trial is called a Bernoulli trial
The Bernoulli Distribution
Two outcomes- The probability of success is denoted by p.
1.Only one Bernoulli trial is performed.
2.There are only two possible outcomes for the single Bernoulli trial.

The random variable X is said to have the Bernoulli distribution with parameter p. The notation
is X ∼ Bernoulli(p).
The Bernoulli Distribution
Mean and Variance of a Bernoulli Random Variable

Let’s assume that x = 1 when we have a “success”, and that x = 0 when we have a “failure”. Then we have:
Data science perspective
Application

Bernoulli trials set the stage for the Binomial

Distribution, which in turn sets the foundation of
any binary classification model.
Example
The Binomial Distribution
Bernoulli trial is always a single trial.

What if we do multiple Bernoulli trials as part of a single experiment?

Like, ’n’ number of times? As an example from field of epidemiology, don’t you think that testing a random set of
1000 subjects for COVID-19 sounds something very similar?

If a Bernoulli trial is repeated n number of times, then the experiment as a whole is known as a Binomial
experiment.
The Binomial Distribution
Sampling a single component from a lot and determining whether it is defective is an example of
a Bernoulli trial.
The number of successes is then a random variable, which is said to have a binomial distribution.
The Binomial Distribution
Code
Binomial
Binomial
Example
Example
Example
Poisson Distribution- Case study

Jenny wants to make sure every customer has a minimal wait time and there’s always someone to
help them, so the customer experience is the best they can provide. But, at times, that hasn’t been
the case. Jenny has learned the hard way that when there’s more than 10 customers at the store,
there’s not have enough staff to help them and some customers end up leaving frustrated with the
long wait and lack of assistance.

You’re a Data Scientist- How you will help her?

Ultimately, Jenny wants you to help her figure out how many customers she should expect
at her shop in any given hour.
Jenny really wants to know, how likely is it that 10 customers will be at the shop at the same time, in any given
hour.
Poisson Distribution
Suppose there is a bakery on the corner of the street and on average 10 customers arrive at the
bakery per hour. For this case, we can calculate the probabilities of different numbers of
customers arriving at the bakery at any hour using the Poisson distribution
Poisson Distribution- Case study
Random variable that is Customer arriving at Jenny’s ice cream shop- Binomial Distribution.

Approximately 5 customers per hour enter Jenny’s shop, i.e., one customer entering every 12
minutes.
Poisson Distribution
Suppose there is a bakery on the corner of the street and on average 10 customers arrive at the
bakery per hour. For this case, we can calculate the probabilities of different numbers of
customers arriving at the bakery at any hour using the Poisson distribution
Poisson Distribution
Poisson distribution deals with the frequency with which an event occurs within a specific
interval.
A Poisson process is represented with the notation Po(λ), where λ represents the expected number
of events that can take place in a period. The expected value and variance of a Poisson process is
λ. X represents the discrete random variable.
The main characteristics which describe the Poisson Processes are:
•The events are independent of each other. An event can occur any number of times (within the
defined period). Two events can’t take place simultaneously.
Poisson Distribution
Poisson Distribution
Python code for Poisson
distribution
Poisson Distribution
❖ In Machine Learning, the Poisson distribution is used in probabilistic models. For example, in a Generalized
Linear Model you can use the Poisson distribution to model the distribution of the target variable.

❖ Extreme weather events[2] or the cascades of Twitter messages and Wikipedia revision history
Poisson Distribution
Poisson Distribution- Solved example
Poisson Distribution- Solved example
Poisson Distribution- Solved example
Poisson Distribution- Solved example

(a)
Poisson Distribution- Solved example
Q. If X ∼ Poisson(3), compute P(X = 2), P(X = 10), P(X = 0), P(X = −1), and P(X =
0.5).

with λ = 3, we obtain
Poisson Distribution- Solved example
Q. If X ∼ Poisson(4), compute P(X ≤ 2) and P(X > 1).

with λ = 4, we obtain
Poisson Distribution- Solved example
Q. Assume that the number of cars that pass through a certain intersection during a fixed time
interval follows a Poisson distribution. Assume that the mean rate is five cars per minute. Find
the probability that exactly 17 cars will pass through the intersection in the next three minutes.
The Mean and Variance of a Poisson Random Variable
CODE-Bernoulli distribution in the case of a biased coin
CODE-Uniform distribution in the case of a biased coin
CODE-Binomial distribution in the case of a biased coin
CODE-Normal distribution in the case of a biased coin
Student t-Test Distribution

Small sample size approximation of a normal distribution

The student’s t-distribution, also known as the t distribution, is a type of statistical distribution similar to the
normal distribution with its bell shape but has heavier tails. The t distribution is used instead of the normal
distribution when you have small sample sizes.
Student t-Test Distribution

For example, suppose we deal with the total apples sold by a shopkeeper in a month. In that case,
we will use the normal distribution. Whereas, if we are dealing with the total amount of apples
sold in a day, i.e., a smaller sample, we can use the t distribution.
The Normal Distribution
The Normal Distribution- Solved Example

Q1. Calculate the probability of normal distribution with the population mean 2, standard deviation 3 or random
variable 5.
The Normal Distribution- Solved Example

Q2. If the value of the random variable is 4, the mean is 4 and the standard deviation is 3, then find the probability
density function of the Gaussian distribution.
The Normal Distribution- Example

The distribution of the heights of human beings. The average height is found to be roughly 175 cm (5' 9"),
counting both males and females.

Financial phenomena—such as expected stock-market returns—

do not fall neatly within a normal distribution
Properties of Normal Distribution

For the normal distribution of data, the mean, median, and

mode are equal.(i.e., Mean = Median = Mode).
Total area under the normal distribution curve is equal to 1.

Normally distributed curve is symmetric at the center along the

mean.

In a normally distributed curve, there is exactly half value to

the right of the central and exactly half value to the right side
of the central value.

Normal distribution is defined using the values of the mean

and standard deviation.
Normal distribution curve is a Unimodal Curve, i.e. a curve with
only one peak.
Z Score

The number z is sometimes called the “z-score” of x. The z-score is

an item sampled from a normal population with mean 0 and
standard deviation 1. This normal population is called the standard
normal population.
Q1. Ball bearings manufactured for a certain z-score (also called a standard score)
Q2. The diameter of a gives you an idea of how far from
application have diameters (in mm) that are
certain ball bearing has a z- the mean a data point is.
normally distributed with mean 5 and
standard deviation 0.08. A particular ball
score of −1.5. Find the
bearing has a diameter of 5.06 mm. Find the diameter in the original
z-score. units of mm.
Z Score

Q3.

Michael scored 86 on the exam. Find the z-score for Michael's

exam grade.
Application

Models like LDA

Gaussian Naive Bayes

Logistic Regression

Linear Regression

Also, Sigmoid functions work most naturally with normally distributed data.
Estimating the Parameters of a Normal Distribution
μ and σ2 of a normal distribution represent its mean and variance.
X1, …, Xn are a random sample from a N(μ, σ2) distribution
μ is estimated with the sample mean and σ2 is estimated with the sample variance s2
As with any sample mean,
Apply probability distributions in real data(Boston data set)
In titanic dataset we want to know What is the probability that a person on the ship
will survive?

https://www.kaggle.com/code/abdelrhmaneltawagny/apply-
probability-distributions-in-real-data
Binomial
The Lognormal Distribution
For data that are highly skewed or that contain outliers, the normal distribution is generally not
appropriate.
The lognormal distribution, which is related to the normal distribution, is often a good choice for
these data sets.
A log-normal distribution is a probability distribution of a random variable whose logarithm is
normally distributed
Uniform Distribution
The probability density function of the continuous uniform distribution with parameters a and
b is

If X is a random variable with probability density function f (x), we say that X is uniformly
distributed on the interval (a, b).

Since the probability density function is constant on the interval (a, b), we can think of the
probability as being distributed “uniformly” on the interval
Uniform Distribution
The probability density function of the continuous uniform distribution with parameters a and
b is
Example: When a motorist stops at a red light at a certain intersection, the waiting time for the light to
turn green, in seconds, is uniformly distributed on the interval (0, 30). Find the probability that
the waiting time is between 10 and 15 seconds.
Mean, Variance
Question 1: If X is uniformly distributed in (-1 , 4) then
Mean= Median= (a + b)/2 (i) its mean is ______________.
(ii) its variance is ______________.
Variance (σ2 )= (b – a)2 /12 (iii) its standard deviation is ___________.
(iv) its median is ______________.
Visualization of Uniform Distribution
Distributions at a Glance
Some Principles of Point Estimation
A quantity calculated from data is called a statistic, and a statistic that is used to estimate an unknown
constant or parameter, is called a point estimator or point estimate.
The quantity most often used to evaluate the overall goodness of an estimator is the mean squared
error (abbreviated MSE), which combines both bias and uncertainty
Maximum Likelihood

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of

an assumed probability distribution, given some observed data.

For a linear model we can write this as y = mx + c.

❖ Maximum likelihood estimation is a method that determines values for the parameters of a
model.

❖ The parameter values are found such that they maximise the likelihood that the process
described by the model produced the data that were actually observed.
Maximum Likelihood
When we perform MLE, we are trying to find the distribution that best fits our data. The
resulting value of the distribution’s parameter is called the maximum likelihood estimate.

Likelihood Function
Maximum Likelihood

Module 2- Statistical Modelling for Data Science ( 20CSE743)

1 I Wonder 5 Activity Book
100% (2)
1 I Wonder 5 Activity Book
29 pages
Probability Distributions
No ratings yet
Probability Distributions
51 pages
CHAPTER 9-Biostatidtics Chapter - Probability Distribution-14!01!2025
No ratings yet
CHAPTER 9-Biostatidtics Chapter - Probability Distribution-14!01!2025
27 pages
Probability Distributions
No ratings yet
Probability Distributions
16 pages
Probability 1
No ratings yet
Probability 1
55 pages
Stati Sem 3 Notes
No ratings yet
Stati Sem 3 Notes
19 pages
UNIT - 4 Complete
No ratings yet
UNIT - 4 Complete
77 pages
Statistics Part2
No ratings yet
Statistics Part2
28 pages
Purpose of Statistical Distributions in Data Analysis
No ratings yet
Purpose of Statistical Distributions in Data Analysis
8 pages
What Is Distribution?
No ratings yet
What Is Distribution?
4 pages
03-Probability & Statistics
No ratings yet
03-Probability & Statistics
35 pages
Business Statistics Session 464748
No ratings yet
Business Statistics Session 464748
37 pages
T Test
No ratings yet
T Test
50 pages
FRM Part 1: Book 2 - Quantitative Analysis
No ratings yet
FRM Part 1: Book 2 - Quantitative Analysis
24 pages
Maths
No ratings yet
Maths
10 pages
Group 2 Practical
No ratings yet
Group 2 Practical
9 pages
Binomial, Poisson & Normal Distribution
No ratings yet
Binomial, Poisson & Normal Distribution
38 pages
Statistics Theory (Soyaib)
No ratings yet
Statistics Theory (Soyaib)
13 pages
PSR Module-Iib
No ratings yet
PSR Module-Iib
61 pages
Normal Distribution:: - Probability - Characteristics and Application of Normal Probability Curve - Sampling Error
No ratings yet
Normal Distribution:: - Probability - Characteristics and Application of Normal Probability Curve - Sampling Error
21 pages
What Is Probability
No ratings yet
What Is Probability
8 pages
Introduction To QLY MGT Module
No ratings yet
Introduction To QLY MGT Module
12 pages
Lecture Slides - Inferential Statistics
100% (1)
Lecture Slides - Inferential Statistics
42 pages
Or2p1 2
No ratings yet
Or2p1 2
28 pages
Probability Distribution
No ratings yet
Probability Distribution
15 pages
ST2187 - Block 6 Common Probability Distributions in Business Applications
No ratings yet
ST2187 - Block 6 Common Probability Distributions in Business Applications
15 pages
CSD502 Standard Probability Dist
No ratings yet
CSD502 Standard Probability Dist
15 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
No ratings yet
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
65 pages
FRM Part 1: Distributions
No ratings yet
FRM Part 1: Distributions
25 pages
Prob Dist
No ratings yet
Prob Dist
21 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Lesson - 3.4 - Statistical Distributions - Measure - Phase
No ratings yet
Lesson - 3.4 - Statistical Distributions - Measure - Phase
62 pages
Probability Distributions
No ratings yet
Probability Distributions
5 pages
Class 4 SP
No ratings yet
Class 4 SP
23 pages
Understanding Discrete & Continuous Distributions
No ratings yet
Understanding Discrete & Continuous Distributions
57 pages
Understanding Random Variables & Distributions
No ratings yet
Understanding Random Variables & Distributions
4 pages
Probability Distribution
No ratings yet
Probability Distribution
29 pages
Understanding Special Probability Distributions
No ratings yet
Understanding Special Probability Distributions
46 pages
Lec 01
No ratings yet
Lec 01
44 pages
KhenpoKarthar KarmaCakme MountainDharma 4
100% (2)
KhenpoKarthar KarmaCakme MountainDharma 4
518 pages
Statistics Notes Part-2
No ratings yet
Statistics Notes Part-2
24 pages
Chapter 3 - Special Probability Distributions
No ratings yet
Chapter 3 - Special Probability Distributions
45 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
55 pages
Probability Distribution
No ratings yet
Probability Distribution
10 pages
Bus Stat CHP 6&7
No ratings yet
Bus Stat CHP 6&7
7 pages
Unit 1 Ssmda Notes
No ratings yet
Unit 1 Ssmda Notes
35 pages
Statistical Distributions
No ratings yet
Statistical Distributions
35 pages
UNIT 1 Notes by ARUN JHAPATE
No ratings yet
UNIT 1 Notes by ARUN JHAPATE
20 pages
The Binomial, Poisson, and Normal Distributions
100% (2)
The Binomial, Poisson, and Normal Distributions
39 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
39 pages
Commonly Used Probability Distribution - SHORT
No ratings yet
Commonly Used Probability Distribution - SHORT
26 pages
Unit 3 Part II
No ratings yet
Unit 3 Part II
45 pages
Session 3 Distribtion
No ratings yet
Session 3 Distribtion
61 pages
Hol 2225 02 Net - PDF - en
No ratings yet
Hol 2225 02 Net - PDF - en
262 pages
Probability 9.21.2019
No ratings yet
Probability 9.21.2019
20 pages
Binomial Distribution
No ratings yet
Binomial Distribution
36 pages
Probility Distribution
No ratings yet
Probility Distribution
41 pages
ALY6000 Module 6.0
No ratings yet
ALY6000 Module 6.0
54 pages
Prescriptive, Descriptive, Formal, Functional, & Pedagogical Grammar
0% (1)
Prescriptive, Descriptive, Formal, Functional, & Pedagogical Grammar
4 pages
Probability and Distribution Concepts
No ratings yet
Probability and Distribution Concepts
12 pages
Mobile App Portfolio
No ratings yet
Mobile App Portfolio
56 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
Mad Summer 2022 Mad Model Answer Paper
No ratings yet
Mad Summer 2022 Mad Model Answer Paper
40 pages
Git 203 Assignment 1
No ratings yet
Git 203 Assignment 1
2 pages
Solutions Manual for Electronic Materials
No ratings yet
Solutions Manual for Electronic Materials
26 pages
Herbert Maryon: Sculptor & Conservator
No ratings yet
Herbert Maryon: Sculptor & Conservator
2 pages
Immediate Future - Going To
No ratings yet
Immediate Future - Going To
7 pages
IBM Z Datathon
0% (1)
IBM Z Datathon
3 pages
Telling Time Worksheets
100% (1)
Telling Time Worksheets
30 pages
FUN WITH GRAMMAR (NOUNS) Chap06.pdf by Betty Azar
100% (1)
FUN WITH GRAMMAR (NOUNS) Chap06.pdf by Betty Azar
19 pages
Valston Hancock: Pilot Officer RAAF Point Cook Flying Officer
No ratings yet
Valston Hancock: Pilot Officer RAAF Point Cook Flying Officer
2 pages
Arts & Crafts at Keswick School
No ratings yet
Arts & Crafts at Keswick School
2 pages
Center 2025
No ratings yet
Center 2025
34 pages
L4 - Locomotion and Movement - Oct 1, 2019 PDF
No ratings yet
L4 - Locomotion and Movement - Oct 1, 2019 PDF
43 pages
What The Internet Really Is
No ratings yet
What The Internet Really Is
3 pages
L7 - Locomotion and Movement - Oct 8, 2019 PDF
No ratings yet
L7 - Locomotion and Movement - Oct 8, 2019 PDF
47 pages
Geography 2016-17 IIHR Project
No ratings yet
Geography 2016-17 IIHR Project
13 pages
CAD, Mechatronics
No ratings yet
CAD, Mechatronics
168 pages
Components of GIS (Praveen) AMREEN
No ratings yet
Components of GIS (Praveen) AMREEN
20 pages
Time Series 10.1007@s10618 019 00619 1
No ratings yet
Time Series 10.1007@s10618 019 00619 1
47 pages
NEET & AIIMS Muscle Contraction Guide
100% (1)
NEET & AIIMS Muscle Contraction Guide
45 pages
Lesson Plan Math-3 (Detailed 1)
No ratings yet
Lesson Plan Math-3 (Detailed 1)
10 pages
Ba I Khao Sat HSG Anh 8 - V1-2021 39144
No ratings yet
Ba I Khao Sat HSG Anh 8 - V1-2021 39144
6 pages
Racism Essay Thesis Statement
100% (3)
Racism Essay Thesis Statement
5 pages
Recognition Model For Solar Radiation Time Series Based On Random Forest With Feature Selection Approach - IEEE Conference Publication - IEEE Xplore
No ratings yet
Recognition Model For Solar Radiation Time Series Based On Random Forest With Feature Selection Approach - IEEE Conference Publication - IEEE Xplore
24 pages
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
No ratings yet
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
13 pages
Long-Time Gap Crowd Prediction Using Time Series Deep Learning Models With Two-Dimensional Single Attribute Inputs 1-S2.0-S1474034621002329-Main
No ratings yet
Long-Time Gap Crowd Prediction Using Time Series Deep Learning Models With Two-Dimensional Single Attribute Inputs 1-S2.0-S1474034621002329-Main
14 pages
Carnatic Raagas' Impact on Emotions
No ratings yet
Carnatic Raagas' Impact on Emotions
1 page
ANSWER KEY Yearly Exame Paper Maths Class 9 Session (2024-25)
No ratings yet
ANSWER KEY Yearly Exame Paper Maths Class 9 Session (2024-25)
12 pages
Session Guide
No ratings yet
Session Guide
6 pages
Bramhotsavam of Sri Venkateswara Schedule and Plan 2024
No ratings yet
Bramhotsavam of Sri Venkateswara Schedule and Plan 2024
15 pages
English FAL P2 May-June 2023-1
No ratings yet
English FAL P2 May-June 2023-1
28 pages
Gold C1 Advanced NE DF UT02
No ratings yet
Gold C1 Advanced NE DF UT02
2 pages
Tech Note 1 AirSmart Addressing
No ratings yet
Tech Note 1 AirSmart Addressing
3 pages
Difference Between BART and BERT
No ratings yet
Difference Between BART and BERT
2 pages
MM Migration Guide en
No ratings yet
MM Migration Guide en
9 pages
New Features 12214 4470018
No ratings yet
New Features 12214 4470018
4 pages
Officer, General Admin, Level 6
No ratings yet
Officer, General Admin, Level 6
8 pages
SEO Basics: Search Engines & Optimization
No ratings yet
SEO Basics: Search Engines & Optimization
52 pages