Probability Theory for Data
Science
ISHAPATHIK DAS
IIT TIRUPATI
TIRUPATI, AP, INDIA
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
To introduce the core principles
of probability theory and
fundamental statistical
techniques, and to demonstrate
methods for solving practical
probability problems and
statistical applications.
Textbook(s): Ross S, A First course in Probability, Prentice Hall of India (2009).
Reference(s):
1. Chung K L, Elementary Probability Theory with Stochastic Process, Springer
Verlag (1974).
2. Drake A, Fundamentals of Applied Probability Theory, McGraw-Hill (1967).
3. Kreyszig E, Advanced Engineering Mathematics, John Wiley & Sons (2010).
4. Hsu H P, Schaum's outline of theory and problems of probability, random
variables, and random processes, McGraw-Hill (1997).
5. Gupta S C, Kapoor V K, Fundamentals of mathematical statistics, Sultan Chand
& Sons (2020).
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability: Probability models and
axioms, conditioning and Bayes'
rule, independence discrete random
variables; probability mass
functions; expectations, examples,
multiple discrete random variables:
joint PMFs, expectations,
conditioning, independence,
continuous random variables,
probability density functions,
expectations, examples, multiple
continuous random variables,
transformation of random variables,
covariance and correlation, iterated
expectations, convolution; notion of
convergence, weak law of large
numbers, central limit theorem.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• A phenomenon refers to a fact, occurrence, or
circumstance that can be observed or is
observable.
• For example, natural phenomena include weather
patterns, fog, thunder, tornadoes, biological
processes, and decomposition.
• In scientific terms, a phenomenon encompasses
any observable event, often involving the use of
instruments to observe, record, or collect data.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Non-
Deterministic
Phenomena
Deterministic
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
There is a mathematical model that enables the
"perfect" prediction of a phenomenon’s outcome.
Numerous examples of this can be found in the
exact sciences, such as Physics and Chemistry.
Consider predicting the amount of money in a
bank account.
If you know the initial deposit and the interest rate,
you can accurately determine the account
balance after one year.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• There is no mathematical model that enables
"perfect" prediction of a phenomenon’s
outcome.
• These phenomena can be divided into two
groups:
• Random phenomena: While individual
outcomes cannot be predicted, the
long-term outcomes exhibit statistical
regularity.
• Haphazard phenomena: Outcomes are
unpredictable, and there is no long-
term statistical regularity.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Random
Non-
Deterministic
Phenomena Haphazard
Deterministic
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• While individual outcomes cannot be predicted, the
long-term results exhibit statistical regularity.
• For example, when rolling a die, the possible
outcomes are S = {1, 2, 3, 4, 5, 6}.
• Although the outcome of a single roll is
unpredictable, over many rolls, each number will
appear approximately 1/6 of the time.
• This regularity is due to the symmetry of a fair die,
where each side is equally likely to occur
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
In this case,
outcomes are
unpredictable and do
not exhibit statistical
regularity over the
long run.
For example, • It is impossible to predict which number they might choose at any given
time.
consider a scenario • We cannot determine the probability of observing any specific value
from 1 to 6.
where someone is • We don't know if the person has a favorite number that they choose more
choosing numbers frequently.
• We have no insight into the process by which the person is selecting the
from 1 to 6. numbers.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
The set of all possible outcomes
Probability Theory for Data
•
of a random phenomena is called
the sample space S.
Science
• Examples:
1. Random Experiment: Tossing a coin. All
Dr. Ishapathik Das, IIT Tirupati
possible outcomes: 𝑺𝟏 ={Head, Tail}.
2. Random Experiment: Rolling a die. All
possible outcomes: 𝑺𝟐 = {𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔}
An event is a subset of the sample space S.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• Random experiment: rolling a die.
• Sample space 𝑺𝟐 = {𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔}.
• An event E={2, 4, 6}, representing
the outcome of rolling an even
number.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Data Science
Probability Theory for
Let S be a non-empty set. A class 𝑪 of subsets of S is called a
field if it contains S itself and is closed under the formation of
complements and finite unions:
1. 𝑆 ∈ 𝑪;
2. 𝐴 ∈ 𝑪 implies 𝐴𝑐 ∈ 𝑪;
3. 𝐴, 𝐵 ∈ 𝑪 implies 𝐴 ∪ 𝐵 ∈ 𝑪.
Dr. Ishapathik Das, IIT
Tirupati
Let S be a non-empty set. A class 𝑪 of subsets of S is called a
field if it contains S itself and is closed under the formation of
complements and countable unions:
• 𝑆 ∈ 𝑪;
• 𝐴 ∈ 𝑪 implies 𝐴𝑐 ∈ 𝑪;
• 𝐴1 , 𝐴2 , … ∈ 𝑪 implies A1 ∪ 𝐴2 ∪ ⋯ ∈ 𝑪.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• The null event (empty event, impossible event) is denoted by Φ.
• Φ represents the event that contains no outcomes.
• The entire event (certain event) is denoted by S.
• S represents the event that contains all possible outcomes.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Union of Sets
The union of two sets A and B (A ∪ B) is the set of elements that are in either A, B, or both. This represents the
combined collection of all elements from both sets.
Definition Example
A ∪ B = {x | x ∈ A or x ∈ B} If A = {1, 2, 3} and B = {3, 4, 5}, then A ∪ B = {1, 2,
3, 4, 5}.
A B
Intersection of Sets
The intersection of two sets A and B (A ∩ B) is the set of elements that are common to both A and B.
This represents the elements that belong to both sets simultaneously.
Definition Example
A ∩ B = {x | x ∈ A and x ∈ B} If A = {1, 2, 3} and B = {3, 4, 5}, then A ∩ B = {3}.
A B
𝐴∩𝐵
Difference of Sets
The difference of two sets A and B (A - B) is the set of elements that are in A but not in B. This represents the
elements that belong to A but not to B.
Definition Example
A - B = {x | x ∈ A and x ∉ B} If A = {1, 2, 3} and B = {3, 4, 5}, then A - B = {1, 2}.
𝐴−𝑩 B
Complement of a Set
The complement of a set A (A') is the set of elements that are not in A. This represents the elements that belong
to the universal set but not to A.
Definition Example
A' = {x | x ∈ U and x ∉ A} If the universal set U = {1, 2, 3, 4, 5} and A = {1, 2,
3}, then A' = {4, 5}.
𝐴′
A
Properties of Set Operations
Set operations exhibit several fundamental properties, including commutativity, associativity, and distributivity,
which are crucial for understanding and manipulating set relationships.
Distributive
Commutative A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) and A ∩ (B ∪ C) =
A ∪ B = B ∪ A and A ∩ B = B ∩ A (A ∩ B) ∪ (A ∩ C)
1 2 3
Associative
(A ∪ B) ∪ C = A ∪ (B ∪ C) and (A ∩ B) ∩ C = A ∩ (B
∩ C)
Two events A and B are said to be mutually exclusive
events if 𝐴 ∩ 𝐵 = Φ.
A B
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science
There are two important approaches for defining the probability of an
event.
1. Classical approach: If an event can happen in ℎ different ways out of
ℎ
a total of 𝑛 equally likely possible ways, the probability of the event is 𝑛.
2. Frequency approach: After conducting an experiment 𝑛 times (where
𝑛 is very large) and observing the event occur in ℎ of those trials, the
ℎ
probability of the event is given by 𝑛. This is also known as the empirical
probability of the event.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati
Both the classical and frequency approaches
have significant drawbacks.
The phrase "equally likely" is ambiguous.
The term "large number" is ambiguous.
This has led mathematicians to adopt an
axiomatic approach for defining probability.
Probability Theory for Data Science
• Let S be a sample space and 𝐶 be a sigma field.
Probability Theory for Data Science
Dr. Ishapathik Das, IIT Tirupati
• To each event 𝐴 ∈ 𝐶 in the class 𝐶 of events, we associate a real
number P(A).
• The P is called a probability function, and P(A) is the probability of
the event A, if the following axioms are satisfied.
• Axiom 1: For any 𝐴 ∈ 𝐶, 𝑃 𝐴 ≥ 0.
• Axiom 2: For certain event 𝑆 ∈ 𝐶, P(S)=1.
• Axiom 3: If 𝐴1 , 𝐴2 , … , 𝐴𝑛 , … are countable collections of pairwise mutually events in the
class 𝐶,
𝑃 𝐴1 ∪ 𝐴2 ∪ ⋯ = 𝑃 𝐴1 + 𝑃 𝐴2 + ⋯ .
• In particular, if 𝐴1 and 𝐴2 are two mutually exclusive events in 𝐶,
𝑃 𝐴1 ∪ 𝐴2 = 𝑃 𝐴1 + 𝑃 𝐴2 .
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Theorem 1.1: If 𝐴1 ⊂ 𝐴2 , then 𝑃 𝐴2 − 𝐴1 = 𝑃 𝐴2 − 𝑃 𝐴1 , and 𝑃 𝐴1 ≤ 𝑃 𝐴2 .
Theorem 1.2: 𝑃 𝐴 ∈ 0,1 , for any event 𝐴 ∈ 𝐶.
Theorem 1.3: 𝑃 Φ = 0, where Φ is the impossible event.
Theorem 1.4: 𝑃 𝐴′ = 1 − 𝑃 𝐴 , where 𝐴′ is the complement of 𝐴.
Theorem 1.5: Let 𝐴1 , 𝐴2 , … , 𝐴𝑛 be pairwise mutually exclusive events. Then
𝑃 𝐴1 ∪ 𝐴2 ∪ ⋯ 𝐴𝑛 = 𝑃 𝐴1 + 𝑃 𝐴2 + ⋯ 𝑃 𝐴𝑛 .
Theorem 1.6: For any two events 𝐴 and 𝐵,
𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵 .
Theorem 1.7: For any two events 𝐴 and 𝐵,
𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 + 𝑃 𝐴 ∩ 𝐵′ .
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science
Dr. Ishapathik Das, IIT Tirupati
• Let 𝐴 be the event that Chennai is among the final 5, and 𝐵 be
the event that Mumbai is among the final 5.
• Given 𝑃(𝐴)=0.2, 𝑃(𝐵)=0.35, and 𝑃(𝐴∩𝐵)=0.08, we need to find
𝑃(𝐴∪𝐵).
• Using Theorem 1.6, we have: 𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵).
• Substituting the given probabilities:
𝑃(𝐴∪𝐵)=0.20+0.35−0.08=0.47.
Probability Theory for Data
Science
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• Imagine you're provided with
information about the potential
outcome of a random experiment before
it occurs.
• How should this information influence
your prediction of the outcome?
• Specifically, how should probabilities be
modified to incorporate this
information?
• Typically, this information is presented
as follows: You're informed that the
outcome falls within a particular event
(i.e., you're notified that a certain event
has taken place).
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• Three prisoners, A, B, and C, are confined in jail.
One of them faces execution, while the
remaining two will be released. Prisoner A
inquires of the guard: "One of my fellow
inmates, either B or C, will be granted freedom.
Could you please inform me which one among
them will be set free?”
• After pondering for a moment, the guard
conveyed to A: "If I refrain from informing you,
your probability of facing death stands at 1/3.
However, if I disclose the information, leaving
only two individuals, you become one of the
candidates for execution, thereby increasing
your chance of death to 1/2. Are you truly
inclined to raise your risk of demise?"
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Data Science
Probability Theory for
• Let's say we're interested in finding the probability of event 𝐴,
and we've been informed that event 𝐵 has happened.
• In this case, the conditional probability of 𝐴 given 𝐵 is
defined as:
𝑃 𝐴∩𝐵
• 𝑃 𝐴𝐵 = , 𝑖𝑓 𝑃 𝐵 ≠ 0.
𝑃 𝐵
Dr. Ishapathik Das, IIT
Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
If we're informed that event B has taken place, then the sample space is confined
to B. The probability within B must be normalized.
This is accomplished by dividing by P(B). Event A can now only transpire if the
outcome falls within AB. Therefore, the updated probability of A is:
𝑃 𝐴∩𝐵
𝑃 𝐴𝐵 = , 𝑖𝑓 𝑃 𝐵 ≠ 0.
𝑃(𝐵)
A B
𝐴∩𝐵
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Roll a fair die once and note the
number facing upward.
Let E denote the event where a 1
appears on the top face.
Let F represent the event where the
number on the top face is odd.
▪ Find P(E).
▪ What is the probability of event E if we are
informed that the number on the top face
is odd, meaning we know that event F has
occurred?
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Central concept: The initial sample
space is no longer applicable.
The updated or diminished sample
space is S={1, 3, 5}.
• Observe that the revised sample space
comprises solely of the outcomes in F.
1
P(E occurs given that F occurs) = .
3
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science
Dr. Ishapathik Das, IIT Tirupati
Two events A and B are said to be independent if
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵)
Probability Theory for Data
Science
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
If A and B are two independent events and, 𝑃 𝐴 ≠ 0 ≠ 𝑃 𝐵 , then
𝑃 𝐴∩𝐵 𝑃 𝐴 𝑃 𝐵
𝑃 𝐴𝐵 = = =𝑃 𝐴 ,
𝑃(𝐵) 𝑃(𝐵)
and
𝑃 𝐴∩𝐵 𝑃 𝐴 𝑃 𝐵
𝑃 𝐵𝐴 = = =𝑃 𝐵 .
𝑃(𝐴) 𝑃(𝐴)
Therefore, in the scenario of independence, the conditional probability of
an event remains unaffected by the knowledge of another event.
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Two events that are mutually Mutually exclusive events
exclusive are only independent exhibit strong dependence
in the specific scenario where otherwise. A and B cannot
either the probability of event A happen simultaneously. If one
equals zero or the probability of event occurs, the other event
event B equals zero. does not.
A B
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati