Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views14 pages

BayesClassifiers Day6

Uploaded by

hrithik suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

BayesClassifiers Day6

Uploaded by

hrithik suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Classification

I) Instance-based methods:
1) Nearest neighbour
II) Probabilistic models:
1) Naïve Bayes
2) Logistic Regression
III) Linear Models:
1) Perceptron
2) Support Vector Machine
IV) Decision Models:
1) Decision Trees
2) Boosted Decision Trees
3) Random Forest
Data sets

• Each training data point to be classified is represented as a


pair (x, y):
• where x is a description of the object : feature vector
• where y is a label (assumed binary for now y ={+1, -1})
Nearest neighbour classifier

• Remember all training data


• Decide a distance function
• When a test data point comes find the nearest
neighbour of the closest point from training data
• Assign the label of nearest neighbour to the test data

Nearest neighbour classification boundary


Disadvantage:
Overfitting, memorising all points

K-NN Classifier
• Find the k nearest neighbours
• Have them vote
• Has a smoothing effect
• This is especially good when there is noise
in the class labels.
As K increases:
• Classification boundary becomes smoother
• Training error can increase What happen if K=N?
Select K by cross-validation
Probabilistic Model
class C1= Green
class C2 = Red

Decision Rule with prior probability :


Label =1 [class C1]
if P(C1) > P(C2)
Label =-1 [class C2]
if P(C2) > P(C1)

P(error)= min { P(C1), P(C2) }

We are not using feature information


Using x information

C2
class conditional density New point (xt)
p(x/y=-1) or p(x/C2)

class conditional density C1


p(x/y=1) or p(x/C1)

Bayes Classifier
p(x/C) C1 Classification on posterior
C2 Label =C1
if P(C1/xt) > P(C2/xt)
or p(xt/C1) P(C1) > p(xt/C2) P(C2)

Label =C2
if P(C2/xt) > P(C1/xt)
or p(xt/C2) P(C2) > p(xt/C1) P(C1)
Risk or generalisation loss

Expected cost(= average cost over many examples)


P(error/x)= min { P(C1/xt), P(C2/xt) } P(C1/xt) P(C2/xt)
posterior

p(xt/C1) P(C1)
The Bayes decision Boundary is optimum, that is, it
minimises the average probability error or risk ! p(xt/C2) P(C2)

P(error) = P(error, x)dx = P(error/x)p(x)dx

joint

p(xt/C2) P(C2) > p(xt/C1) P(C1) p(xt/C1) P(C1) > p(xt/C2) P(C2)

P(error)
Naive Bayes Classifier

Assumption : Features are independent


FLU chills headache fever

N Y M Y

Y Y N N

Y Y S Y

Y N M Y

N N N N

Y N S Y

N N S N

Y Y M Y

? Y M N
P(Ch =Y/Flu =Y)=3/5 P(Fe=Y/Flu=Y)=4/5
P(Ch =N/Flu =Y)=2/5 P(Fe=N/Flu=Y)=1/5
P(Ch =Y/Flu =N)=1/3 P(Fe=Y/Flu=N)=1/3
FLU chills headache fever P(Ch =N/Flu =N)=2/3 P(Fe=N/Flu=N)=2/3

N Y M Y P(Ha=M/Flu=Y)=2/5 P(Flu=Y)=5/8
P(Ha=N/Flu=Y)=1/5 P(Flu=N) =3/8
Y Y N N P(Ha=S/Flu=Y)=2/5
P(Ha=M/Flu=N)=1/3
Y Y S Y P(Ha=N/Flu=N)=1/3
P(Ha=S/Flu=N)=1/3
Y N M Y

N N N N z=P(Ch =Y,Ha=M,Fe=N)

P(Flu=Y/Ch =Y,Ha=M,Fe=N)
Y N S Y = P(Ch =Y,Ha=M,Fe=N /Flu=Y) P(Flu=Y)/z
= P(Ch =Y/Flu=Y)P(Ha=M//Flu=Y)P(Fe=N /Flu=Y)P(Flu=Y)/z
N N S N =3/5*2/5*1/5 *5/8=3/100z

Y Y M Y P(Flu=N/Ch =Y,Ha=M,Fe=N)
= P(Ch =Y,Ha=M,Fe=N /Flu=N) P(Flu=N)/z
? Y M N = P(Ch =Y/Flu=N)P(Ha=M//Flu=N)P(Fe=N /Flu=N)P(Flu=N)/z
=1/3*1/3*2/3 *3/8=1/36z

P(Flu=Y/Ch =Y,Ha=M,Fe=N) > P(Flu=N/Ch =Y,Ha=M,Fe=N)

You might also like