0% found this document useful (0 votes)

28 views24 pages

ML 09 Naive Bayes Classifier

The document discusses the Naïve Bayes Classifier, a probabilistic framework for classification problems that assumes conditional independence among attributes. It explains Bayes theorem, how to compute posterior probabilities, and the implications of conditional independence on classification accuracy. The document also highlights the advantages and disadvantages of the Naïve Bayes Classifier, including its robustness to noise and handling of missing values, while noting potential performance degradation due to correlated attributes.

Uploaded by

Triveni Jayaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views24 pages

ML 09 Naive Bayes Classifier

Uploaded by

Triveni Jayaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CS 60050

Machine Learning

Naïve Bayes Classifier

Some slides taken from course materials of Tan, Steinbach, Kumar

Bayes Classifier

●A probabilistic framework for solving classification

problems
● Approach for modeling probabilistic relationships
between the attribute set and the class variable
– May not be possible to certainly predict class label of a
test record even if it has identical attributes to some
training records
– Reason: noisy data or presence of certain factors that
are not included in the analysis
Probability Basics

● P(A = a, C = c): joint probability that random

variables A and C will take values a and c
respectively
● P(A = a | C = c): conditional probability that A will
take the value a, given that C has taken value c
P( A, C )
P(C | A) =
P( A)
P( A, C )
P( A | C ) =
P(C )
Bayes Theorem

● Bayes theorem:

P( A | C ) P(C )
P(C | A) =
P( A)

● P(C) known as the prior probability

● P(C | A) known as the posterior probability
Example of Bayes Theorem

● Given:
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20

● If a patient has stiff neck, what’s the probability

he/she has meningitis?
P( S | M ) P( M ) 0.5 ×1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
Bayesian Classifiers

● Consider each attribute and class label as random

variables

● Given
a record with attributes (A1, A2,…,An)
– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )

● Canwe estimate P(C| A1, A2,…,An ) directly from

data?
Bayesian Classifiers

● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem

P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n

P( A A ! A )
1 2 n

1 2 n
Bayesian Classifiers

● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
Class-conditional
probability Prior probability

P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n

P( A A ! A )
1 2 n

1 2 n

Posterior probability Evidence

Bayesian Classifiers

● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n

P( A A ! A )
1 2 n

1 2 n

– Choose value of C that maximizes

P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)

● How to estimate P(A1, A2, …, An | C )?

Naïve Bayes Classifier

● Assumes all attributes Ai are conditionally independent,

when class C is given:
– P(A1, A2, …, An |C) = P(A1| C) P(A2| C)… P(An| C)

– Can estimate P(Ai | Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) Π P(Ai | Cj) is

maximal.
Conditional independence: basics

● Let X, Y, Z denote three sets of random variables

● The variables in X are said to be conditionally
independent of variables in Y, given Z if
● P( X | Y, Z ) = P( X | Z )

● An example
– Level of reading skills of people tends to increase with
length of the arm
– Explanation: both increase with age of a person
– If age is given, arm length and reading skills are
(conditionally) independent
Conditional independence: basics

● If X and Y are conditionally independent, given Z

P( X, Y | Z ) = P(X, Y, Z) / P(Z)
= P(X, Y, Z) / P(Y, Z) * P(Y, Z) / P(Z)
= P(X | Y, Z) * P(Y | Z)
= P(X | Z) * P(Y | Z)

P( X, Y | Z ) = P(X | Z) * P(Y | Z)
NB assumption:
P(A1, A2, …, An |C) = P(A1| C) P(A2| C)… P(An| C)
How to Estimate
l l
Probabilities from Data?
a a u s
r ic r ic o
o o u
g g it n ss
c at
e
c at
e
c on cl
a ● Class: P(C) = Nc/N
Tid Refund Marital Taxable – e.g., P(No) = 7/10,
Status Income Evade
P(Yes) = 3/10
1 Yes Single 125K No
2 No Married 100K No
● For discrete attributes:
3 No Single 70K No
4 Yes Married 120K No P(Ai | Ck) = |Aik|/ Nc k
5 No Divorced 95K Yes
6 No Married 60K No
– where |Aik| is number of
instances having attribute
7 Yes Divorced 220K No
Ai and belongs to class Ck
8 No Single 85K Yes
9 No Married 75K No
– Examples:
10 No Single 90K Yes P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
10
How to Estimate Probabilities from Data?

● For
continuous attributes, two options:
– Discretize the range into bins
u one ordinal attribute per bin
– Probability density estimation:
u Assume attribute follows a Gaussian / normal
distribution
u Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
u Once probability distribution is known, can use it to
estimate the conditional probability P(Ai|c)
How too Estimate Probabilities from Data?
a l a l s
u
o u ric ric o
e g e g t in s s
c at c at c o n
cl
a
Tid Refund Marital
Status
Taxable
Income Evade
● Normal distribution:
1 −
( Ai − µ ij ) 2

1 Yes Single 125K No

P( A | c ) = e 2 σ ij2

2πσ
i j 2
2 No Married 100K No
ij
3 No Single 70K No
4 Yes Married 120K No – One for each (Ai, cj) pair
5 No Divorced 95K Yes
6 No Married 60K No ● For (Income, Class=No):
7 Yes Divorced 220K No
– If Class=No
8 No Single 85K Yes
9 No Married 75K No
u sample mean = 110
10 No Single 90K Yes u sample variance = 2975
10

1 −
( 120 −110 ) 2

P( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2π (54.54)
A complete example
Example of Naïve Bayes Classifier
Given a Test Record:
X = (Refund
ca
l
ca
l= No, Married,
us
Income = 120K)
r i r i o
o o u
g g it n s
at
e
Training data: at
e
on las naive Bayes Classifier:
c c c c
Tid Refund Marital Taxable
Status Income Evade P(Refund=Yes|No) = 3/7
P(Refund=No|No) = 4/7
1 Yes Single 125K No P(Refund=Yes|Yes) = 0
P(Refund=No|Yes) = 1
2 No Married 100K No
P(Marital Status=Single|No) = 2/7
3 No Single 70K No P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
4 Yes Married 120K No P(Marital Status=Single|Yes) = 2/7
5 No Divorced 95K Yes P(Marital Status=Divorced|Yes)=1/7
P(Marital Status=Married|Yes) = 0
6 No Married 60K No
7 Yes Divorced 220K No For taxable income:
If class=No: sample mean=110
8 No Single 85K Yes sample variance=2975
9 No Married 75K No If class=Yes: sample mean=90
sample variance=25
10 No Single 90K Yes
10
Example of Naïve Bayes Classifier
Given a Test Record:
X = (Refund = No, Married, Income = 120K)
naive Bayes Classifier:

P(Refund=Yes|No) = 3/7 ● P(X|Class=No) = P(Refund=No|Class=No)

● Ifone of the conditional probability is zero, then

the entire expression becomes zero
● Probability estimation:

N ic
Original : P ( Ai | C ) =
Nc
c: number of classes
N ic + 1
Laplace : P ( Ai | C ) = p: prior probability
Nc + c
m: parameter
N ic + mp
m - estimate : P ( Ai | C ) =
Nc + m
Naïve Bayes: Pros and Cons

● Robust to isolated noise points

● Can handle missing values by ignoring the

instance during probability estimate calculations

● Robust to irrelevant attributes

● Independence assumption may not hold for some

attributes
– Presence of correlated attributes can degrade
performance of NB classifier
Example with correlated attribute

● Two attributes A, B and class Y (all binary)

● Prior probabilities:
– P(Y=0) = P(Y=1) = 0.5
● Class conditional probabilities of A:
– P(A=0 | Y=0) = 0.4 P(A=1 | Y=0) = 0.6
– P(A=0 | Y=1) = 0.6 P(A=1 | Y=1) = 0.4
● Class conditional probabilities of B are same as
that of A
● B is perfectly correlated with A when Y=0, but is
independent of A when Y=1
Example with correlated attribute

● Need to classify a record with A=0, B=0

● Need to classify a record with A=0, B=0

● In reality, since B is perfectly correlated to A when
Y= 0
● P(Y=0 | A=0,B=0) = P(A=0,B=0 | Y=0) P(Y=0)
P(A=0, B=0)
= P(A=0|Y=0) P(Y=0)
P(A=0, B=0)
= (0.4 * 0.5) / P(A=0,B=0)

● Hence prediction should have been Y=0

Other Bayesian classifiers

● Ifit is suspected that attributes may have

correlations:
● Can use other techniques such as Bayesian
Belief Networks (BBN)
● Uses a graphical model (network) to capture prior
knowledge in a particular domain, and causal
dependencies among variables

Conditional Probability, Total Probability Theorem, Bayes Rule
No ratings yet
Conditional Probability, Total Probability Theorem, Bayes Rule
21 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
Naive Bayes
No ratings yet
Naive Bayes
13 pages
Naïve Bayes Classification
No ratings yet
Naïve Bayes Classification
21 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Inverse Gamma Distribution Guide
100% (2)
Inverse Gamma Distribution Guide
4 pages
Lecture 7 - Naïve Bayes Classification
No ratings yet
Lecture 7 - Naïve Bayes Classification
12 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian Classification, Nearest
No ratings yet
Bayesian Classification, Nearest
46 pages
CCS - Lec 5
No ratings yet
CCS - Lec 5
33 pages
Naïve Bayes - Categorical Naive Bayes 2025
No ratings yet
Naïve Bayes - Categorical Naive Bayes 2025
17 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
37 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Naive Bayes
No ratings yet
Naive Bayes
13 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
Naive by
No ratings yet
Naive by
23 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
Naive Bayes
No ratings yet
Naive Bayes
19 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Managerial Statistics Uu
No ratings yet
Managerial Statistics Uu
138 pages
Naive Ba Yes
No ratings yet
Naive Ba Yes
28 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Navie Classifier
No ratings yet
Navie Classifier
8 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
DM NaiveBayes
No ratings yet
DM NaiveBayes
15 pages
Naïve Bayes for Data Scientists
No ratings yet
Naïve Bayes for Data Scientists
31 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
6 - Naive Bayes
No ratings yet
6 - Naive Bayes
26 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
21 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Bayes
No ratings yet
Bayes
10 pages
Bayesian Inference With INLA, 1st Edition Exclusive Download
100% (9)
Bayesian Inference With INLA, 1st Edition Exclusive Download
14 pages
Bayesian Classification Guide
No ratings yet
Bayesian Classification Guide
6 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Bayesian Classification Insights
No ratings yet
Bayesian Classification Insights
7 pages
Bayes' Theorem for Data Science
No ratings yet
Bayes' Theorem for Data Science
10 pages
Naive Bayes
No ratings yet
Naive Bayes
19 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
FRM Probability Basics
No ratings yet
FRM Probability Basics
6 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
PR January20 05 PDF
No ratings yet
PR January20 05 PDF
24 pages
Document From Triveni
No ratings yet
Document From Triveni
2 pages
Foundations of Data Science - Unit 6 - Naive Bayes
No ratings yet
Foundations of Data Science - Unit 6 - Naive Bayes
12 pages
Probability and Random Processes: Lessons 1-3 Basic Concepts
No ratings yet
Probability and Random Processes: Lessons 1-3 Basic Concepts
86 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Probability Theory Class Notes
No ratings yet
Probability Theory Class Notes
17 pages
ProSta Chap1 (2021.2)
No ratings yet
ProSta Chap1 (2021.2)
96 pages
Advanced Probability Problems
No ratings yet
Advanced Probability Problems
7 pages
Chapter 3.1 Conditional Probability
100% (1)
Chapter 3.1 Conditional Probability
22 pages
Lin Reg Outputs
No ratings yet
Lin Reg Outputs
1 page
Probability Concepts for Students
No ratings yet
Probability Concepts for Students
39 pages
Intro to Info Theory & Probability
No ratings yet
Intro to Info Theory & Probability
133 pages
Activity - 05: "Vector"
No ratings yet
Activity - 05: "Vector"
11 pages
Kmean PGM
No ratings yet
Kmean PGM
3 pages
Bayes - Ballesteros Et Al 2018
No ratings yet
Bayes - Ballesteros Et Al 2018
8 pages
Mathematical Foundations of Computer Science Lecture Outline
No ratings yet
Mathematical Foundations of Computer Science Lecture Outline
5 pages
p4 Outputs
No ratings yet
p4 Outputs
3 pages
Cyber Scheme May 2024
No ratings yet
Cyber Scheme May 2024
20 pages
Week1 Notes
No ratings yet
Week1 Notes
5 pages
IT590 Bayesian Theory Lecture 2
No ratings yet
IT590 Bayesian Theory Lecture 2
5 pages
Probability Events Explained
No ratings yet
Probability Events Explained
7 pages
Ba Yes Thinking W FM
No ratings yet
Ba Yes Thinking W FM
5 pages
Banner GPTT
No ratings yet
Banner GPTT
1 page
Bayes Lectures English
No ratings yet
Bayes Lectures English
74 pages
Conditional Probability: Definition and Properties
No ratings yet
Conditional Probability: Definition and Properties
1 page
Unit V
No ratings yet
Unit V
57 pages
Statatis and Probability Chapter 2
No ratings yet
Statatis and Probability Chapter 2
20 pages
Probability Theory
No ratings yet
Probability Theory
14 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
7 pages
3 - Calculating Prob
No ratings yet
3 - Calculating Prob
130 pages
PSMOD - Concept of Probability
No ratings yet
PSMOD - Concept of Probability
39 pages
Conditional & Independent Events
No ratings yet
Conditional & Independent Events
15 pages
Conditional Probability, Multiplication Rule, Independence
No ratings yet
Conditional Probability, Multiplication Rule, Independence
17 pages
Tuto1 Merged
No ratings yet
Tuto1 Merged
11 pages

ML 09 Naive Bayes Classifier

Uploaded by

ML 09 Naive Bayes Classifier

Uploaded by

CS 60050

Naïve Bayes Classifier

Some slides taken from course materials of Tan, Steinbach, Kumar

●A probabilistic framework for solving classification

● P(A = a, C = c): joint probability that random

● P(C) known as the prior probability

● If a patient has stiff neck, what’s the probability

● Consider each attribute and class label as random

● Canwe estimate P(C| A1, A2,…,An ) directly from

Posterior probability Evidence

– Choose value of C that maximizes

– Equivalent to choosing value of C that maximizes

● How to estimate P(A1, A2, …, An | C )?

● Assumes all attributes Ai are conditionally independent,

– Can estimate P(Ai | Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) Π P(Ai | Cj) is

● Let X, Y, Z denote three sets of random variables

● If X and Y are conditionally independent, given Z

1 Yes Single 125K No

P( Income = 120 | No) = e 2 ( 2975 )

P(Refund=Yes|No) = 3/7 ● P(X|Class=No) = P(Refund=No|Class=No)

● Ifone of the conditional probability is zero, then

● Robust to isolated noise points

● Can handle missing values by ignoring the

● Robust to irrelevant attributes

● Independence assumption may not hold for some

● Two attributes A, B and class Y (all binary)

● Need to classify a record with A=0, B=0

● Need to classify a record with A=0, B=0

● Hence prediction should have been Y=0

● Ifit is suspected that attributes may have

You might also like