CLASSIFICATION: Bayesian Classifiers
Uses Bayes (Thomas Bayes, 1701-1781) Theorem to
build probabilistic models of relationships between
attributes and classes
Statistical principle for combining prior class
knowledge with new evidence from data
Multiple implementations
Nave Bayes
Bayesian networks
CLASSIFICATION: Bayesian Classifiers
Requires concept of conditional probability
Measures the probability of an event given that (by
evidence or information) another event has
occurred
Notation: P(A|B) = Probability of A given that
knowledge of B occurred
P(A|B) = P(AB)/P(B)
Equivalently if P(B) 0, = P(AB) = P(A|B)P(B)
BAYESIAN CLASSIFIERS: Conditional
Probability
Example:
Suppose 1% of a specific population has a form of cancer
A new diagnostic test
produces correct positive results for those with the
cancer of 99% of the time
produces correct negative results for those without the
cancer of 98% of the time
P(cancer) = 0.01
P(cancer | positive test) = 0.99
P(no cancer | negative test) = 0.98
BAYESIAN CLASSIFIERS: Conditional
Probability
Example:
But what if you tested positive? What is the
probability that you actually have cancer?
Bayes Theorem reverses the process to provide
us with an answer.
BAYESIAN CLASSIFIERS: Bayes Theorem
P(B|A) = P(BA)/P(A), if P(A)0
= P(AB)/P(A)
= P(A|B)P(B)/P(A)
Application to our example
P(cancer | test positive) =
P(test positive | cancer)*P(cancer)/P(test positive) =
0.01*0.99/(0.01*0.99+0.99*0.98) = 0.01
BAYESIAN CLASSIFIERS: Bayes Theorem
0.99
cancer
0.01
0.99
0.01
No cancer
Test positive
Test negative
0.98
Test positive
0.02
Test negative
BAYESIAN CLASSIFIERS: Nave Bayes
Bayes Theorem Interpretation
P(class C| F1, F2, , Fn) =
P(class C) P(F1, F2, , Fn| C)/P(F1, F2, , Fn)
posterior = prior likelihood/evidence
BAYESIAN CLASSIFIERS: Nave Bayes
Key concepts
Denominator independent of class C
Denominator effectively constant
Numerator equivalent to joint probability model
P(C, F1, F2, , Fn)
Nave conditional independence assumptions
P(C|F1, F2, , Fn) P(C)P(F1|C) P(F
n
2|C) P(Fn|C)
P C P Fi | C
i1
BAYESIAN CLASSIFIERS: Nave Bayes
Multiple distributional assumptions possible
Gaussian
Multinomial
Bernoulli
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Training set (example from Wikipedia)
Sex
Height(feet)
Weight(pounds)
Footsize(inches)
male
male
male
male
female
female
female
female
6
5.92 (5'11")
5.58 (5'7")
5.92 (5'11")
5
5.5(5'6")
5.42 (5'5")
5.75 (5'9")
180
190
170
165
100
150
130
150
12
11
12
10
6
8
7
9
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Assumptions
Continuous data
Gaussian (Normal) distribution
2
x
1
p
exp
2
2
2
P(male) = P(female) = 0.5
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Classifier generated from training set
Sex
Heightmean
Heightvariance
Weightmean
Weightvariance
Footsizemean
Footsizevariance
male
5.855
0.035033
176.25
122.92
11.25
0.91667
female
5.4175
0.097225
132.5
558.33
7.5
1.6667
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Test sample
Sex
Height
Weight
Footsize
sample
6
130
8
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Calculate posterior probabilities for both
genders
Posterior(male) = P(male)P(height|
male)P(weight|male)P(foot size|
male)/evidence
Posterior(female) = P(female)P(height|
female)P(weight|female)P(foot size|
female)/evidence
Evidence is constant and same so we ignore
denominators
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Calculations for male
P(male) = 0.5 (assumed)
1
P(height|male) =
2 0.035033
P(weight|male) =
6 5.855 2
1.5789
exp
2 0.035033
130 176.25 2
5.988110 6
exp
2 122.92
2 122.92
8 11.25 2
1.3112 10 3
exp
2 0.91667
2 0.91667
1
P(foot size|male) =
6.1984 10 9
Posterior numerator (male)
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Calculations for female
P(male) = 0.5 (assumed)
6 5.4175
1
P(height|fwmale) = 2 0.097225 exp 2 0.097225
P(weight|female) =
P(foot size|female) =
0.22346
130 132.5 2
0.016789
exp
2 558.33
2 558.33
8 7.5 2
1.3112 10 3
exp
2 1.6667
2 1.6667
1
5.3778104
Posterior numerator (female)
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Conclusion
Posterior numerator (significantly) greater for
female classification than for male, so classify
sample as female
BAYESIAN CLASSIFIERS: Nave Bayes
Example
Note
We did not calculate P(evidence) [normalizing
constant] since not needed, but could
P(evidence) = P(male)P(height|
male)P(weight|male)P(foot size|male) +
P(female)P(height|female)P(weight|
female)P(foot size|female)
BAYESIAN CLASSIFIERS: Bayesian Networks
Judea Pearl (UCLA Computer Science, Cognitive
Systems Lab): one of the pioneers of Bayesian
Networks
Author: Probabilistic Reasoning in Intelligent
Systems,1988
Father of journalist Daniel Pearl
Kidnapped and murdered in Pakistan in 2002
by Al-Queda
BAYESIAN CLASSIFIERS: Bayesian Networks
Probabilistic graphical model
Represents random variables and conditional
dependencies using a directed acyclic graph
(DAG)
Nodes of graph represent random variables
BAYESIAN CLASSIFIERS: Bayesian Networks
Edges of graph represent conditional
dependencies
Unconnected nodes conditionally independent
of each other
Does not require all attributes to be
conditionally independent
BAYESIAN CLASSIFIERS: Bayesian Networks
Probability table associating each node to its
immediate parent nodes
If node X has no immediate parents, table
contains only prior probability P(X)
If one parent Y, table contains P(X|Y)
If multiple parents {Y1, Y2, , Yn}, table
contains P(X|Y1, Y2, , Yn)
BAYESIAN CLASSIFIERS: Bayesian Networks
BAYESIAN CLASSIFIERS: Bayesian Networks
Model encodes relevant probabilities from
which probabilistic inferences can then be
calculated
Joint probability: P(G, S, R) = P(R)P(S|R)*P(G|S, R)
G = Grass wet
S = Sprinkler on
R = Raining
BAYESIAN CLASSIFIERS: Bayesian Networks
We can then calculate, for example:
P it is raining AND grass is wet
P it is raining | grass is wet
P grass is wet
sprinkler T,F
P grass is wet = T AND sprinkler AND raining=T
sprinkler T,F , raining T,F
P grass is wet = T AND sprinkler AND raining
BAYESIAN CLASSIFIERS: Bayesian Networks
That is
P it is raining | grass is wet
P TTT P TFT
P TTT P TTF P TFT P TFF
0.198 0.154
0.3577
0.198 0.288 0.1584 0.0
BAYESIAN CLASSIFIERS: Bayesian Networks
Building the model
Create network structure (graph)
Determine probability values of tables
Simplest case
Network defined by user
Most real-world cases
Defining network too com[plex
Use machine learning: many algorithms
BAYESIAN CLASSIFIERS: Bayesian Networks
Algorithms built into Weka
User defined network
Conditional independence tests
Genetic search
Hill climber
K2
Simulated annealing
Maximum weight spanning tree
Tabu search
BAYESIAN CLASSIFIERS: Bayesian Networks
Many other versions online
BNT (Bayes Net Tree) Matlab toolbox
Kevin Murphy, University of British Columbia
http://www.cs.ubc.ca/~murphyk/Software/