Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
156 views60 pages

Probability for Data Science Students

The document outlines the principles of probability theory and its applications in data science, emphasizing key concepts such as random phenomena, sample spaces, events, and set operations. It discusses various approaches to defining probability, including classical and frequency methods, and introduces axiomatic definitions with associated theorems. The text serves as a foundational resource for understanding probability in the context of data science, supported by recommended textbooks and references.

Uploaded by

Somasekhar Lalam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views60 pages

Probability for Data Science Students

The document outlines the principles of probability theory and its applications in data science, emphasizing key concepts such as random phenomena, sample spaces, events, and set operations. It discusses various approaches to defining probability, including classical and frequency methods, and introduces axiomatic definitions with associated theorems. The text serves as a foundational resource for understanding probability in the context of data science, supported by recommended textbooks and references.

Uploaded by

Somasekhar Lalam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Probability Theory for Data

Science

ISHAPATHIK DAS
IIT TIRUPATI
TIRUPATI, AP, INDIA

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
To introduce the core principles
of probability theory and
fundamental statistical
techniques, and to demonstrate
methods for solving practical
probability problems and
statistical applications.

Textbook(s): Ross S, A First course in Probability, Prentice Hall of India (2009).

Reference(s):
1. Chung K L, Elementary Probability Theory with Stochastic Process, Springer
Verlag (1974).
2. Drake A, Fundamentals of Applied Probability Theory, McGraw-Hill (1967).
3. Kreyszig E, Advanced Engineering Mathematics, John Wiley & Sons (2010).
4. Hsu H P, Schaum's outline of theory and problems of probability, random
variables, and random processes, McGraw-Hill (1997).
5. Gupta S C, Kapoor V K, Fundamentals of mathematical statistics, Sultan Chand
& Sons (2020).

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability: Probability models and
axioms, conditioning and Bayes'
rule, independence discrete random
variables; probability mass
functions; expectations, examples,
multiple discrete random variables:
joint PMFs, expectations,
conditioning, independence,
continuous random variables,
probability density functions,
expectations, examples, multiple
continuous random variables,
transformation of random variables,
covariance and correlation, iterated
expectations, convolution; notion of
convergence, weak law of large
numbers, central limit theorem.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• A phenomenon refers to a fact, occurrence, or
circumstance that can be observed or is
observable.

• For example, natural phenomena include weather


patterns, fog, thunder, tornadoes, biological
processes, and decomposition.

• In scientific terms, a phenomenon encompasses


any observable event, often involving the use of
instruments to observe, record, or collect data.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Non-
Deterministic
Phenomena
Deterministic

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
There is a mathematical model that enables the
"perfect" prediction of a phenomenon’s outcome.

Numerous examples of this can be found in the


exact sciences, such as Physics and Chemistry.

Consider predicting the amount of money in a


bank account.

If you know the initial deposit and the interest rate,


you can accurately determine the account
balance after one year.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• There is no mathematical model that enables
"perfect" prediction of a phenomenon’s
outcome.

• These phenomena can be divided into two


groups:
• Random phenomena: While individual
outcomes cannot be predicted, the
long-term outcomes exhibit statistical
regularity.
• Haphazard phenomena: Outcomes are
unpredictable, and there is no long-
term statistical regularity.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Random
Non-
Deterministic
Phenomena Haphazard
Deterministic

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• While individual outcomes cannot be predicted, the
long-term results exhibit statistical regularity.

• For example, when rolling a die, the possible


outcomes are S = {1, 2, 3, 4, 5, 6}.

• Although the outcome of a single roll is


unpredictable, over many rolls, each number will
appear approximately 1/6 of the time.

• This regularity is due to the symmetry of a fair die,


where each side is equally likely to occur

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
In this case,
outcomes are
unpredictable and do
not exhibit statistical
regularity over the
long run.

For example, • It is impossible to predict which number they might choose at any given
time.
consider a scenario • We cannot determine the probability of observing any specific value
from 1 to 6.
where someone is • We don't know if the person has a favorite number that they choose more
choosing numbers frequently.
• We have no insight into the process by which the person is selecting the
from 1 to 6. numbers.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
The set of all possible outcomes

Probability Theory for Data



of a random phenomena is called
the sample space S.

Science
• Examples:

1. Random Experiment: Tossing a coin. All

Dr. Ishapathik Das, IIT Tirupati


possible outcomes: 𝑺𝟏 ={Head, Tail}.

2. Random Experiment: Rolling a die. All


possible outcomes: 𝑺𝟐 = {𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔}
An event is a subset of the sample space S.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• Random experiment: rolling a die.

• Sample space 𝑺𝟐 = {𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔}.

• An event E={2, 4, 6}, representing


the outcome of rolling an even
number.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Data Science
Probability Theory for

Let S be a non-empty set. A class 𝑪 of subsets of S is called a


field if it contains S itself and is closed under the formation of
complements and finite unions:
1. 𝑆 ∈ 𝑪;
2. 𝐴 ∈ 𝑪 implies 𝐴𝑐 ∈ 𝑪;
3. 𝐴, 𝐵 ∈ 𝑪 implies 𝐴 ∪ 𝐵 ∈ 𝑪.
Dr. Ishapathik Das, IIT
Tirupati
Let S be a non-empty set. A class 𝑪 of subsets of S is called a
field if it contains S itself and is closed under the formation of
complements and countable unions:
• 𝑆 ∈ 𝑪;
• 𝐴 ∈ 𝑪 implies 𝐴𝑐 ∈ 𝑪;
• 𝐴1 , 𝐴2 , … ∈ 𝑪 implies A1 ∪ 𝐴2 ∪ ⋯ ∈ 𝑪.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• The null event (empty event, impossible event) is denoted by Φ.
• Φ represents the event that contains no outcomes.
• The entire event (certain event) is denoted by S.
• S represents the event that contains all possible outcomes.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Union of Sets
The union of two sets A and B (A ∪ B) is the set of elements that are in either A, B, or both. This represents the
combined collection of all elements from both sets.

Definition Example
A ∪ B = {x | x ∈ A or x ∈ B} If A = {1, 2, 3} and B = {3, 4, 5}, then A ∪ B = {1, 2,
3, 4, 5}.

A B
Intersection of Sets
The intersection of two sets A and B (A ∩ B) is the set of elements that are common to both A and B.
This represents the elements that belong to both sets simultaneously.

Definition Example
A ∩ B = {x | x ∈ A and x ∈ B} If A = {1, 2, 3} and B = {3, 4, 5}, then A ∩ B = {3}.

A B

𝐴∩𝐵
Difference of Sets
The difference of two sets A and B (A - B) is the set of elements that are in A but not in B. This represents the
elements that belong to A but not to B.

Definition Example
A - B = {x | x ∈ A and x ∉ B} If A = {1, 2, 3} and B = {3, 4, 5}, then A - B = {1, 2}.

𝐴−𝑩 B
Complement of a Set
The complement of a set A (A') is the set of elements that are not in A. This represents the elements that belong
to the universal set but not to A.

Definition Example
A' = {x | x ∈ U and x ∉ A} If the universal set U = {1, 2, 3, 4, 5} and A = {1, 2,
3}, then A' = {4, 5}.

𝐴′

A
Properties of Set Operations
Set operations exhibit several fundamental properties, including commutativity, associativity, and distributivity,
which are crucial for understanding and manipulating set relationships.

Distributive
Commutative A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) and A ∩ (B ∪ C) =
A ∪ B = B ∪ A and A ∩ B = B ∩ A (A ∩ B) ∪ (A ∩ C)

1 2 3

Associative
(A ∪ B) ∪ C = A ∪ (B ∪ C) and (A ∩ B) ∩ C = A ∩ (B
∩ C)
Two events A and B are said to be mutually exclusive
events if 𝐴 ∩ 𝐵 = Φ.

A B

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati

Probability Theory for Data Science


There are two important approaches for defining the probability of an
event.

1. Classical approach: If an event can happen in ℎ different ways out of



a total of 𝑛 equally likely possible ways, the probability of the event is 𝑛.

2. Frequency approach: After conducting an experiment 𝑛 times (where


𝑛 is very large) and observing the event occur in ℎ of those trials, the

probability of the event is given by 𝑛. This is also known as the empirical
probability of the event.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati

Both the classical and frequency approaches


have significant drawbacks.

The phrase "equally likely" is ambiguous.

The term "large number" is ambiguous.

This has led mathematicians to adopt an


axiomatic approach for defining probability.

Probability Theory for Data Science


• Let S be a sample space and 𝐶 be a sigma field.

Probability Theory for Data Science


Dr. Ishapathik Das, IIT Tirupati

• To each event 𝐴 ∈ 𝐶 in the class 𝐶 of events, we associate a real


number P(A).

• The P is called a probability function, and P(A) is the probability of


the event A, if the following axioms are satisfied.
• Axiom 1: For any 𝐴 ∈ 𝐶, 𝑃 𝐴 ≥ 0.

• Axiom 2: For certain event 𝑆 ∈ 𝐶, P(S)=1.

• Axiom 3: If 𝐴1 , 𝐴2 , … , 𝐴𝑛 , … are countable collections of pairwise mutually events in the


class 𝐶,
𝑃 𝐴1 ∪ 𝐴2 ∪ ⋯ = 𝑃 𝐴1 + 𝑃 𝐴2 + ⋯ .

• In particular, if 𝐴1 and 𝐴2 are two mutually exclusive events in 𝐶,

𝑃 𝐴1 ∪ 𝐴2 = 𝑃 𝐴1 + 𝑃 𝐴2 .

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Theorem 1.1: If 𝐴1 ⊂ 𝐴2 , then 𝑃 𝐴2 − 𝐴1 = 𝑃 𝐴2 − 𝑃 𝐴1 , and 𝑃 𝐴1 ≤ 𝑃 𝐴2 .

Theorem 1.2: 𝑃 𝐴 ∈ 0,1 , for any event 𝐴 ∈ 𝐶.

Theorem 1.3: 𝑃 Φ = 0, where Φ is the impossible event.

Theorem 1.4: 𝑃 𝐴′ = 1 − 𝑃 𝐴 , where 𝐴′ is the complement of 𝐴.

Theorem 1.5: Let 𝐴1 , 𝐴2 , … , 𝐴𝑛 be pairwise mutually exclusive events. Then


𝑃 𝐴1 ∪ 𝐴2 ∪ ⋯ 𝐴𝑛 = 𝑃 𝐴1 + 𝑃 𝐴2 + ⋯ 𝑃 𝐴𝑛 .

Theorem 1.6: For any two events 𝐴 and 𝐵,


𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵 .

Theorem 1.7: For any two events 𝐴 and 𝐵,


𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 + 𝑃 𝐴 ∩ 𝐵′ .

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science
Dr. Ishapathik Das, IIT Tirupati

• Let 𝐴 be the event that Chennai is among the final 5, and 𝐵 be


the event that Mumbai is among the final 5.

• Given 𝑃(𝐴)=0.2, 𝑃(𝐵)=0.35, and 𝑃(𝐴∩𝐵)=0.08, we need to find


𝑃(𝐴∪𝐵).

• Using Theorem 1.6, we have: 𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵).

• Substituting the given probabilities:


𝑃(𝐴∪𝐵)=0.20+0.35−0.08=0.47.
Probability Theory for Data
Science
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• Imagine you're provided with
information about the potential
outcome of a random experiment before
it occurs.

• How should this information influence


your prediction of the outcome?

• Specifically, how should probabilities be


modified to incorporate this
information?

• Typically, this information is presented


as follows: You're informed that the
outcome falls within a particular event
(i.e., you're notified that a certain event
has taken place).

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
• Three prisoners, A, B, and C, are confined in jail.
One of them faces execution, while the
remaining two will be released. Prisoner A
inquires of the guard: "One of my fellow
inmates, either B or C, will be granted freedom.
Could you please inform me which one among
them will be set free?”

• After pondering for a moment, the guard


conveyed to A: "If I refrain from informing you,
your probability of facing death stands at 1/3.
However, if I disclose the information, leaving
only two individuals, you become one of the
candidates for execution, thereby increasing
your chance of death to 1/2. Are you truly
inclined to raise your risk of demise?"

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Data Science
Probability Theory for

• Let's say we're interested in finding the probability of event 𝐴,


and we've been informed that event 𝐵 has happened.

• In this case, the conditional probability of 𝐴 given 𝐵 is


defined as:

𝑃 𝐴∩𝐵
• 𝑃 𝐴𝐵 = , 𝑖𝑓 𝑃 𝐵 ≠ 0.
𝑃 𝐵
Dr. Ishapathik Das, IIT
Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
If we're informed that event B has taken place, then the sample space is confined
to B. The probability within B must be normalized.

This is accomplished by dividing by P(B). Event A can now only transpire if the
outcome falls within AB. Therefore, the updated probability of A is:

𝑃 𝐴∩𝐵
𝑃 𝐴𝐵 = , 𝑖𝑓 𝑃 𝐵 ≠ 0.
𝑃(𝐵)

A B
𝐴∩𝐵

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Roll a fair die once and note the
number facing upward.

Let E denote the event where a 1


appears on the top face.

Let F represent the event where the


number on the top face is odd.

▪ Find P(E).
▪ What is the probability of event E if we are
informed that the number on the top face
is odd, meaning we know that event F has
occurred?
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Central concept: The initial sample
space is no longer applicable.

The updated or diminished sample


space is S={1, 3, 5}.
• Observe that the revised sample space
comprises solely of the outcomes in F.

1
P(E occurs given that F occurs) = .
3

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati

Probability Theory for Data Science


Dr. Ishapathik Das, IIT Tirupati

Two events A and B are said to be independent if

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵)

Probability Theory for Data


Science
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
If A and B are two independent events and, 𝑃 𝐴 ≠ 0 ≠ 𝑃 𝐵 , then

𝑃 𝐴∩𝐵 𝑃 𝐴 𝑃 𝐵
𝑃 𝐴𝐵 = = =𝑃 𝐴 ,
𝑃(𝐵) 𝑃(𝐵)
and
𝑃 𝐴∩𝐵 𝑃 𝐴 𝑃 𝐵
𝑃 𝐵𝐴 = = =𝑃 𝐵 .
𝑃(𝐴) 𝑃(𝐴)

Therefore, in the scenario of independence, the conditional probability of


an event remains unaffected by the knowledge of another event.

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Two events that are mutually Mutually exclusive events
exclusive are only independent exhibit strong dependence
in the specific scenario where otherwise. A and B cannot
either the probability of event A happen simultaneously. If one
equals zero or the probability of event occurs, the other event
event B equals zero. does not.

A B

Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Dr. Ishapathik Das, IIT Tirupati

Probability Theory for Data Science


Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati
Probability Theory for Data Science Dr. Ishapathik Das, IIT Tirupati

You might also like