Machine Learning
Lecture 12: Naïve Bayes Classifier
COURSE CODE: CSE451
2023
Course Teacher
Dr. Mrinal Kanti Baowaly
Associate Professor
Department of Computer Science and
Engineering, Bangabandhu Sheikh
Mujibur Rahman Science and
Technology University, Bangladesh.
Email: [email protected]
Definition: Naïve Bayes Classifier
A probabilistic classifier based on applying Bayes’ theorem
The reason why it is called ‘Naïve’ because it requires rigid
independence assumption between input variables/attributes
Two specific assumptions are required for the attributes:
Attributes are statistically independent given the class value
Attributes are equally important
Bayes’ theorem
Using Bayes theorem, we can find the probability of A happening, given that B has occurred.
Here, B is the evidence and A is the hypothesis.
Likelihood Prior
Probability
Evidence
Probability » Read Conditional
Posterior Probability
Probability
Naïve Bayes Classifier
Given a problem instance X to predict the class labels Y. In the
Bayes’ theorem, if the evidence (B) is represented by an instance
(X) and the hypothesis (A) is represented by a class label 𝑦 ∈ 𝑌,
then the probability of the class label 𝑦 given an instance X is:
If we have multiple features i.e., 𝑋 = (𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ) then the
Bayes’ theorem can be rewritten as:
How to classify with Naïve Bayes
Classifier
For the classification, Naïve Bayes Classifier finds the probability of
all class labels and pick the most probable one to label the instance
Suppose, we have two class labels, 𝑌 = 𝑦𝑒𝑠, 𝑛𝑜 and an instance 𝑋
Calculate posterior probabilities: 𝑃 𝑦𝑒𝑠 𝑋 and 𝑃 𝑛𝑜 𝑋
If 𝑃 𝑦𝑒𝑠 𝑋 > 𝑃 𝑛𝑜 𝑋 , then 𝑋 is labeled/classified as 𝑦𝑒𝑠
otherwise as 𝑛𝑜
Example: Classify with Naïve Bayes
Classifier
Problem: If the weather is sunny then can players play or not?
Weather Play
Sunny ?
Solution: Find P(Yes|Sunny) and P(No|Sunny)
Example: Classify with Naïve Bayes
Classifier (Cont..)
P(Yes|Sunny) = P( Sunny|Yes) * P(Yes) / P (Sunny) We can see that
Here, P(Sunny|Yes) = 3/9 = 0.33, P(Yes)= 9/14 = 0.64, P(Yes|Sunny) >
P(Sunny) = 5/14 = 0.36
Now, P(Yes|Sunny) = 0.33 * 0.64 / 0.36 = 0.60 P(No|Sunny)
So if the weather is
P(No|Sunny) = P(Sunny|No) * P(No) / P (Sunny) sunny then players
Here, P(Sunny|No) = 2/5 = 0.40, P(No)= 5/14 = 0.36,
P(Sunny) = 5/14 = 0.36 can play the sport.
Now, P(No|Sunny) = 0.40* 0.36/ 0.36 = 0.40
Now it’s your turn
Problem 1: If the weather is overcast then can players play or not?
Problem 2: If the weather is rainy then can players play or not?
Example: Classify with Naïve Bayes
Classifier (In case of multiple features)
Suppose we have a Day with the following values :
Outlook = Rain
Humidity = High No need to calculate this
Wind = Weak probability (Evidence)
Play = ?
Let X = (Outlook=Rain, Humidity=High, Wind = Weak)
Find, P(Yes|X) = P(X|Yes) * P(Yes) / P(X) and P(No|X) = P(X|No) * P(No) / P(X)
Now, P(X|Yes) * P(Yes)
= P(Outlook=Rain, Humidity=High, Wind = Weak|Yes) * P(Yes)
= P(Outlook = Rain|Yes)*P(Humidity= High|Yes)* P(Wind= Weak|Yes)*P(Yes)
Solution: Dzone - Naive Bayes Tutorial
Estimating conditional probabilities for
continuous attributes
A Gaussian distribution is usually chosen to represent the class
conditional probabilities for continuous attributes
For each class y, the class conditional probability for xi
where 𝜇 represents mean and 𝜎 2 represents variance.
HW: Zero frequency problem
What is Zero frequency problem in Naïve Bayes Classifier?
How to handle with Zero frequency problem?
Types of Naïve Bayes Classifier
Multinomial Naive Bayes : When features are discrete count
variables / categorical
Bernoulli Naive Bayes : When feature vectors are binary (i.e. zeros
and ones)
Gaussian Naive Bayes : When features follow a normal distribution
»Read Normal Distribution
Adv. & Disadv. of Naïve Bayes Classifier
Advantage
Works surprisingly well
Simple
Handling missing value is easier
Robust to irrelevant attributes
Disadvantage
Can’t handle dependent variables
Suffers from “Zero Frequency” problem
Some Learning Materials
Naïve Bayes
Naive Bayes Tutorial: Naive Bayes Classifier in Python
Naive Bayes Classification using Scikit-learn