Statistics & Probability
Probability vs. Statistics Both deal with uncertainty, randomness
Probability: It deals with the prediction of likelihood of future events
Logically self-contained
Follows few rules
Has one correct answer
Statistics: It involves the analysis of the frequency of past events
Works on experimental data
No single correct answer
Helps to understand patterns, relationships, and trends within data
Statistics – Classification
Descriptive Statistics: It involves methods for summarizing and presenting data in a
meaningful and concise manner. Provides a snapshot of the data’s characteristics.
e.g.: Mean, Median, Mode, Range, Variance, Standard Deviation
Inferential Statistics: It involves using sample data to draw inferences, predictions, or
decisions about a larger population.
e.g.: Hypothesis Testing, Confidence Intervals
Terminology
A population is the entire group that you want to draw conclusions about.
A sample is the specific group that you will collect data from using random
sampling. The size of the sample is always less than the total size of the population.
Sampling
Inference
Link: Quora
I.I.D
Independent and identically distributed (or IID) random variables are mutually
independent of each other and are identically distributed in the sense that they are drawn
from the same probability distribution.
Example: Coin flips, Rolling a fair six-sided die, survey responses are assumed to be IID
Who is best? X vs. Y
Exams 1 2 3 4 5 6 7 8 9 10
Student Scores (50)
X 28 32 44 33 43 30 41 36 42 27
Y 48 19 22 45 26 50 31 28 38 49
Student X Student Y
SUM 356 356
MEAN 35.6 35.6
MEDIAN 34.5 34.5
STANDARD 6.4 11.9
DEVIATION
Probability
In a random experiment, probability of an event is a number indicating how likely that
event will occur.
The sample space associated with a random experiment is the set of all
possible outcomes. An event is a subset of the sample space.
e.g., Ω = {1,2,3,4,5,6}
A real number P(A) is assigned to every event A, called the probability of A.
This number P(A) is always between 0 and 1, where 0 indicates impossibility and
1 indicates certainty.
Probability
To qualify as a probability, P must satisfy three axioms:
Axiom 1: P(A) ≥ 0 for any event A
Axiom 2: Probability of the sample space, P(Ω) = 1
Axiom 3: If A1,A2, . . . are disjoint (mutually exclusive) events, then
P(A1 A2 A3..... )=P(A1) + P(A2) + P(A3) + ....
Probability Models – Discrete Probability
Discrete Probability Model: In discrete probability models we can compute the
probability of events by adding all the corresponding outcomes
Bernoulli Model: Simple Model that represents binary outcomes
with a single parameter
Binomial Model: Describes the number of successes in a fixed Source: Wikipedia
number of independent Bernoulli trial
Poisson Model: Models the number of events occurring in a fixed
interval of time or space
Probability Models – Continuous Probability
Continuous Probability Models: In continuous probability models, a random variable X
can take on any value (is continuous).
Uniform Distribution: Represents a situation where all values in an interval are equally
likely. e.g.: Rolling of a fair six-sided die, generating random numbers between 0 and 1
Normal (Gaussian) Distribution: Describes continuous variables that are symmetrically
distributed around a mean, often referred to as the bell curve. e.g.: Height of adult
individuals in a population
Probability Models – Joint Probability
Joint Probability: It is the likelihood of more than one event occurring at the same time
Conditions:
One is that events X and Y must happen at the same time. Example: Throwing two dice
simultaneously.
The other is that events X and Y must be independent of each other. That means the
outcome of event X does not influence the outcome of event Y.
Example: Rolling two Dice.
If the following conditions met, then P(A∩B) = P(A) * P(B).
Probability Models – Joint Probability
Tossing a fair coin and Rolling a Die
Find the joint probability of getting heads on the coin toss and rolling a 5 on the die?
the probability of getting heads (event A): P(A) = 0.5
the probability of rolling a 5 (event B): P(B) = 1/6
P(A ∩ B) = P(A) * P(B) = 0.5 * (1/6) = 1/12 ≈ 0.0833
What will happen to the joint probability of two dependent events?
Probability Models – Conditional Probability
Conditional probability: It is the probability of occurrence of an event B given the
knowledge that an event A has already occurred. It is denoted by P(B|A).
The joint probability of two dependent events then becomes P(A and B) = P(A)P(B|A)
Note:
Two events A and B are independent if P(AB) = P(A) P(B)
Two events A and B are conditionally independent given C if they are independent after
conditioning on C
P(AB|C) = P(B|AC)P(A|C) = P(B|C)P(A|C)
Conditional Probability - Example
60% of students pass the Final and 45% of students pass both the Final and the
Midterm. What percent of students who passed the Final also passed the Midterm?
What percent of students passed the Midterm given they passed the Final?
P(F) =0.6 , P(M and F) = 0.45, P(M|F)=?
P(M and F) = P(F) P(M|F)
P(M|F) = P(M and F) / P(F) = .45 / .60 = .75
Bayes Theorem
Joint Probability of two dependent events:
P(A and B) = P(A)P(B|A) P(B and A) = P(B)P(A|B)
P(A)P(B|A) = P(B)P(A|B) P(A|B) = P(A) P(B|A) / P(B)
This is the Bayes theorem
It tells us: how often A happens given that B happens: P(A|B) – Posterior Probability
When we know:
how often B happens given that A happens, P(B|A) – Likelihood
how likely A is on its own, P(A) – Prior Probability
how likely B is on its own, P(B) – Evidence
Bayes Theorem [Example]
A rare disease affect 0.1% of the population. A test proposed for this disease is 99% accurate
for people who have the disease. What is the probability that a person actually has the disease
if they test positive?
Only 9% chance that the person is having the disease!
Probability Distribution
A probability distribution is a mathematical function that describes the likelihood of
various outcomes in a random experiment or process.
Probability Mass Function (PMF):
The PMF gives the probability that a specific value of the random variable occurs in
case of discrete distributions
Probability Density Function (PDF):
The PDF in case of continuous distributions, gives the relative likelihood of the random
variable taking on a particular value.
The area under the PDF curve over an interval represents the probability of the variable
falling within that interval
Probability Mass Function
A PMF describes the probabilities of
discrete random variables taking on specific
values.
It provides a complete distribution of
probabilities for all possible outcomes.
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
𝑬 𝑿 =𝟏 +𝟐 +𝟑 +𝟒 +𝟓 +𝟔
𝟔 𝟔 𝟔 𝟔 𝟔 𝟔
= 𝟑. 𝟓
PMF-Example
Consider a biased coin with P(H) = 0.7 and P(T) = 0.3
X: The number of times the coin lands heads up in two consecutive flips.
Possible Outcomes: X={0,1,2}
P(X=0) = P(TT) = 0.3*0.3 = 0.09
P(X=1) = P(HT) + P(TH) = 0.7*0.3+0.3*0.7 = 0.42 PMF
P(X=2) = P(HH) = 0.7*0.7 = 0.49
Expected Value (Mean):
PMF-Example
Consider a biased coin with P(H) = 0.7 and P(T) = 0.3
X: The number of times the coin lands heads up in two consecutive flips.
Possible Outcomes: X={0,1,2}
P(X=0) = P(TT) = 0.3*0.3 = 0.09
P(X=1) = P(HT) + P(TH) = 0.7*0.3+0.3*0.7 = 0.42 PMF, Total Probability = 1.0
P(X=2) = P(HH) = 0.7*0.7 = 0.49
Variance, Var(X)
+
Types of PMFs – Bernoulli Distribution
The Bernoulli distribution models a single binary trial with two possible outcomes:
success (S) with probability "p" and failure (F) with probability "q" (where q = 1 - p).
Each trial in the Bernoulli distribution has only two possible outcomes, often labeled as 1
(success) and 0 (failure).
Types of PMFs – Binomial Distribution
It is a discrete probability distribution that models the number of successes in a fixed
number of independent Bernoulli trials
It summarizes the probability that a value will take one of two independent values
under a given set of parameters or assumptions
N = 20 trials with p=q=1/2
The binomial distribution of obtaining exactly n
successes out of N Bernoulli trials is given by:
Types of PMFs – Poisson Distribution (
It is a discrete probability distribution that models the number of events occurring in a
fixed interval of time or space, given the average rate of occurrence.
The distribution is used to model rare events that occur randomly and independently
over a specified time or space interval
λ: the average rate of occurrence
k: number of events
Probability Density Function
It is used to describe the probability distribution of a continuous random variable, where
the set of possible outcomes is an uncountably infinite range, such as real numbers within
an interval.
It defines the likelihood of the variable falling within a particular range of values.
Uniform Distribution
It describes a situation where all values within a specific interval [a, b] are equally
likely to occur, i.e., have the same probability of occurring
a b
PDF - Normal/Gaussian Distribution
It is a continuous probability distribution that is
characterized by its bell-shaped curve. The curve
tails off towards the extremes.
It is symmetric with the highest point at the
mean, and the spread of the distribution
determined by the standard deviation.
( )
( )
Normal Distribution - Empirical (68-95-99.7)Rule
It is a statistical guideline that describes the approximate
distribution of data in a normal distribution
About 68.26% of the data falls within 1σ of µ.
About 95.44% of the data falls within 2σ of µ.
About 99.72% of the data falls within 3σ of µ.
The rest 0.28% of the whole data lies outside http://www.cs.uni.edu/~campbell/stat/normfact.html
3σ of µ
Example
Consider a dataset of exam scores that follows a normal distribution with a mean of 80 and
a standard deviation of 10.
About 68% of the scores fall within the range: 70-90
About 95% of the scores fall within the range: 60-100
About 99.7% of the scores fall within the range: 50-110
Central Limit Theorem
It describes the behavior of the sample means from a population, regardless of the
population's underlying distribution
It states that as the sample size increases, the distribution of sample means
approaches a normal distribution, regardless of the original population's distribution.
Provided we have a population with μ and σ and take large random samples (n ≥ 30)
from the population with replacement, the distribution of the sample means will be
approximately normally distributed with: