Text Classification
Slides adapted from Lyle Ungar and Dan Jurafsky
Example: Positive or negative movie review?
Example: what is the subject of this article?
Text Classification
Text Classification: Definition
Supervised learning: classification methods
Any kind of classifier
• Naïve Bayes
• Logistic Regression
• Support-vector machines
• K-Nearest Neighbors
• Neural Networks
Text Classification: Naïve Bayes
Naïve Bayes Intuition
Text: Bag of words representation
Text: Bag of words using a subset of words
Text: Bag of words representation (vectors)
Bayes’ Rule for document and classes
Bayes’ Rule and MAP (I)
Bayer’s Rule and MAP (II)
Bayes’ Rule and MAP (III)
Naïve Bayes Independence Assumptions
Naïve Bayes Classifier
Learning Naïve Bayes Model: Prior
First attempt: maximum likelihood to estimate parameters
Simply use the frequencies in the data
Fraction of documents belonging to topic j
Learning Naïve Bayes Model: Conditional Probabilities
Fraction of times word wi appears
in all words in documents of topic cj
xi = wi, word
Create mega-document for topic j by concatenating all docs in the topic
Zero probability problems
Laplace (add-1) smoothing for Naïve Bayes
Algorithm with smoothing parameter
Example
Class
c0
c0
c0
c1
?
Example
Class
c0
c0
c0
c1
?
Priors:
Example
Class
c0
c0
c0
c1
?
Priors:
P(c0)=3/4
P(c1)=1/4
Example
Class
c0
c0
c0
c1
?
Conditional Probabilities:
Example
Class
c0
c0
c0
c1
?
Conditional Probabilities:
P(Chinese|c0)=(5+1)/(8+6)=6/14=3/7 P(Chinese|c1)=(1+1)/(3+6)=2/9
P(Tokyo|c0)=(0+1)/(8+6)=1/14 P(Tokyo|c1)=(1+1)/(3+6)=2/9
P(Japan|c0)=(0+1)/(8+6)=1/14 P(Japan|c1)=(1+1)/(3+6)=2/9
Example
Class
c0
c0
c0
c1
?
Choosing a class
P(c0|d5)
P(c1|d5)
Summary