0% found this document useful (0 votes)

11 views80 pages

Text Classification

The document discusses Naive Bayes classification, particularly in the context of text sentiment analysis, spam detection, and authorship identification. It explains the basic principles of the Naive Bayes classifier, including the bag of words representation, maximum likelihood estimation, and the challenges of zero probabilities and unknown words. Additionally, it covers techniques like Laplace smoothing and the handling of stop words in text classification.

Uploaded by

Mehedi Hasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views80 pages

Text Classification

Uploaded by

Mehedi Hasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Naive Bayes

and The Task of Text

Sentiment Classification
Classification
Is this spam?
Who wrote which Federalist papers?
1787-8: anonymous essays try to convince New
York to ratify U.S Constitution: Jay, Madison,
Hamilton.
Authorship of 12 of the letters in dispute
1963: solved by Mosteller and Wallace using
Bayesian methods

James Madison Alexander Hamilton

What is the subject of this medical article?
MEDLINE Article MeSH Subject Category Hierarchy
Antogonists and Inhibitors
Blood Supply
Chemistry
? Drug Therapy
Embryology
Epidemiology
…
4
Positive or negative movie review?

+ ...zany characters and richly applied satire, and some great

plot twists

− It was pathetic. The worst part about it was the boxing

scenes...
...awesome caramel sauce and sweet toasty almonds. I
+ love this place!

− ...awful pizza and ridiculously overpriced...

5
Positive or negative movie review?

+ ...zany characters and richly applied satire, and some great

plot twists

− It was pathetic. The worst part about it was the boxing

scenes...
...awesome caramel sauce and sweet toasty almonds. I
+ love this place!

− ...awful pizza and ridiculously overpriced...

6
Why sentiment analysis?

Movie: is this review positive or negative?

Products: what do people think about the new iPhone?
Public sentiment: how is consumer confidence?
Politics: what do people think about this candidate or issue?
Prediction: predict election outcomes or market trends from
sentiment

7
Scherer Typology of Affective States
Emotion: brief organically synchronized … evaluation of a major event
◦ angry, sad, joyful, fearful, ashamed, proud, elated
Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
◦ cheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stances: affective stance toward another person in a specific interaction
◦ friendly, flirtatious, distant, cold, warm, supportive, contemptuous
Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
◦ liking, loving, hating, valuing, desiring
Personality traits: stable personality dispositions and typical behavior tendencies
◦ nervous, anxious, reckless, morose, hostile, jealous
Scherer Typology of Affective States
Emotion: brief organically synchronized … evaluation of a major event
◦ angry, sad, joyful, fearful, ashamed, proud, elated
Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
◦ cheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stances: affective stance toward another person in a specific interaction
◦ friendly, flirtatious, distant, cold, warm, supportive, contemptuous
Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
◦ liking, loving, hating, valuing, desiring
Personality traits: stable personality dispositions and typical behavior tendencies
◦ nervous, anxious, reckless, morose, hostile, jealous
Basic Sentiment Classification

Sentiment analysis is the detection of

attitudes
Simple task we focus on in this chapter
◦ Is the attitude of this text positive or negative?
We return to affect classification in later
chapters
Summary: Text Classification

Sentiment analysis
Spam detection
Authorship identification
Language Identification
Assigning subject categories, topics, or genres
…
Text Classification: definition

Input:
◦ a document d
◦ a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c  C

Classification Methods: Hand-coded rules

Rules based on combinations of words or other features

◦ spam: black-list-address OR (“dollars” AND “you have been
selected”)
Accuracy can be high
◦ If rules carefully refined by expert
But building and maintaining these rules is expensive
Classification Methods:
Supervised Machine Learning
Input:
◦ a document d
◦ a fixed set of classes C = {c1, c2,…, cJ}
◦ A training set of m hand-labeled documents
(d1,c1),....,(dm,cm)
Output:
◦ a learned classifier γ:d → c

14
Classification Methods:
Supervised Machine Learning
Any kind of classifier
◦ Naïve Bayes
◦ Logistic regression
◦ Neural networks
◦ k-Nearest Neighbors
◦ …
Text
Classification The Task of Text
and Naive Classification
Bayes
Text
Classification Naive Bayes (I)
and Naive
Bayes
Naive Bayes Intuition

Simple (“naive”) classification method based on

Bayes rule
Relies on very simple representation of document
◦ Bag of words
The Bag of Words Representation

19
The bag of words representation
seen 2

γ( )=c
sweet 1
whimsical 1
recommend 1
happy 1
... ...
Text
Classification Naive Bayes (I)
and Naïve
Bayes
Text
Classification Formalizing the Naive
and Naïve Bayes Classifier
Bayes
Bayes’ Rule Applied to Documents and Classes

•For a document d and a class c

P(d | c)P(c)
P(c | d) =
P(d)
Naive Bayes Classifier (I)

cMAP = argmax P(c | d) MAP is “maximum a

posteriori” = most
cÎC likely class

P(d | c)P(c)
= argmax Bayes Rule

cÎC P(d)
= argmax P(d | c)P(c) Dropping the
denominator
cÎC
Naive Bayes Classifier (II)
"Likelihood" "Prior"

cMAP = argmax P(d | c)P(c)

cÎC
Document d

= argmax P(x1, x2,… , xn | c)P(c) represented as

features
cÎC x1..xn
Naïve Bayes Classifier (IV)

cMAP = argmax P(x1, x2,… , xn | c)P(c)

cÎC

O(|X|n•|C|) parameters How often does this

class occur?

Could only be estimated if a

We can just count the
very, very large number of relative frequencies in
training examples was a corpus

available.
Multinomial Naive Bayes Independence
Assumptions
P(x1, x2,… , xn | c)
Bag of Words assumption: Assume position doesn’t matter
Conditional Independence: Assume the feature
probabilities P(xi|cj) are independent given the class c.

P(x1,… , xn | c) = P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

Multinomial Naive Bayes Classifier

cMAP = argmax P(x1, x2,… , xn | c)P(c)

cÎC

cNB = argmax P(c j )Õ P(x | c)

cÎC xÎX
Applying Multinomial Naive Bayes Classifiers
to Text Classification

positions  all word positions in test document

cNB = argmax P(c j )

c j ÎC
Õ P(xi | c j )
iÎ positions
Example
Let me explain a Multinomial Naïve Bayes Classifier
where we want to filter out the spam messages.
Initially, we consider eight normal messages and
four spam messages.
Histogram of all the words that occur in the
normal messages from family and friends
The probability of word dear given that we saw in
normal message is-
Probability (Dear|Normal) =
Probability (Friend|Normal) =
Probability (Lunch|Normal) =
Probability (Money|Normal) =
The probability of word dear given that we saw in
normal message is-
Probability (Dear|Normal) = 8 /17 = 0.47
Similarly, the probability of word Friend is-
Probability (Friend/Normal) = 5/ 17 =0.29
Probability (Lunch/Normal) = 3/ 17 =0.18
Probability (Money/Normal) = 1/ 17 =0.06
Histogram for Spam Message
he probability of word dear given that we saw in
spam message is-
Probability (Dear|Spam) =
Probability (Friend|Spam) =
Probability (Lunch|Spam) =
Probability (Money|Spam) =
he probability of word dear given that we saw in
spam message is-
Probability (Dear|Spam) = 2 /7 = 0.29
Similarly, the probability of word Friend is-
Probability (Friend|Spam) = 1/ 7 =0.14
Probability (Lunch|Spam) = 0/ 7 =0.00
Probability (Money|Spam) = 4/ 7 =0.57
What is the probability of “Dear Friend” as
normal message?
What is the probability of “Dear Friend” as
Spam message?
Problems with multiplying lots of probs
There's a problem with this:
cNB = argmax P(c j )
c j ÎC
Õ P(xi | c j )
iÎ positions

Multiplying lots of probabilities can result in floating-point

underflow!
Luckily, log(ab) = log(a) + log(b)
Let's sum logs of probabilities instead of multiplying probabilities!
We actually do everything in log space
Instead of this:
cNB = argmax P(c j )
c j ÎC
Õ P(xi | c j )
iÎ positions
This:

This is ok since log doesn't change the ranking of the classes (class with
highest prob still has highest log prob)
Model is now just max of sum of weights: a linear function of the inputs
So naive bayes is a linear classifier
Text
Classificatio Formalizing the Naïve
n and Naïve Bayes Classifier
Bayes
Text
Classification Naive Bayes: Learning
and Naïve
Bayes
Sec.13.3

Learning the Multinomial Naive Bayes Model

First attempt: maximum likelihood estimates

◦ simply use the frequencies in the data

doccount(C = c j )
P̂(c j ) =
N doc
count(wi , c j )
P̂(wi | c j ) =
å count(w, c j )
wÎV
Sec.13.3

Learning the Multinomial Naive Bayes Model

First attempt: maximum likelihood estimates
◦ simply use the frequencies in the data
Compute the probability of for a class C
doccount(C = c j ) P(Normal) =
P̂(c j ) = 8/12
N doc
Compute the probability of a word given a class ∈{ Positive, Negative }
count(wi , c j )
P̂(wi | c j ) =
å count(w, c j )
wÎV
Parameter estimation

count(wi , c j ) fraction of times word wi appears

P̂(wi | c j ) =
å count(w, c j ) among all words in documents of topic cj
wÎV

Create mega-document for topic j by concatenating all

docs in this topic
◦ Use frequency of w in mega-document
Doc 12, Normal 8 , Spam = 4
P ( Normal) = 8/12
P (Spam) = 4/12

Normal (17) Spam (7)

Dear – 8 Dear – 2
Friend – 5 Friend – 1
Lunch – 3 Lunch – 0
Money – 1 Money – 4
Probability of “Dear Friend” belongs to -
P ( Normal| “Dear Friend”) = (8/17) * (5/17) * (8/12)
P (Spam| “Dear Friend”) = (2/7) * (1/7) * (4/12)

Normal Spam
Dear – 8 Dear – 2
Friend – 5 Friend – 1
Lunch – 3 Lunch – 0
Money – 1 Money – 4
Probability of “Lunch Money” belongs to -

P ( Normal| “Lunch Money”) = (3/17) * (1/17) * (8/12)

P (Spam| “Lunch Money”) = (0/7) * (4/7) * (4/12) = 0

Normal Spam
Dear – 8 Dear – 2
Friend – 5 Friend – 1
Lunch – 3 Lunch – 0
Money – 1 Money – 4
Sec.13.3

Problem with Maximum Likelihood

What if we have seen no training documents with the word fantastic

and classified in the topic positive (thumbs-up)?

count("fantastic", positive)
P̂("fantastic" positive) = = 0
å count(w, positive)
wÎV

Zero probabilities cannot be conditioned away, no matter the other

evidence!
cMAP = argmax c P̂(c)Õ P̂(xi | c)
i
Laplace (add-1) smoothing for Naïve Bayes

count(wi , c) +1
P̂(wi | c) =
å (count(w, c))+1)
wÎV

count(wi , c) +1
=
æ ö
çç å count(w, c)÷÷ + V
è wÎV ø
P ( Normal| “Lunch Money”) = (?) * (?) * (8/12)
Normal
P (Spam| “Lunch Money”) = Dear – 8
count(wi , c) count(wi , c) +1 Friend – 5
P̂(wi | c) = =
å (count(w, c)) æ ö
çç å count(w, c)÷÷ + V
Lunch – 3
wÎV
è wÎV ø Money – 1

Unique Word = 4, Number of occurrence = Spam

17 Dear – 2
P(N|Lunch money) Friend – 1
= ( (3+1)/ (17+4) ) * (2/21 ) * (8/12) =0.012 Lunch – 0
P(S|Lunch money) Money – 4
= (1/11) * (5/11) * (4/12) = 0.013
Unknown words
What about unknown words
◦ that appear in our test data
◦ but not in our training data or vocab
We ignore them
◦ Remove them from the test document!
◦ Pretend they weren't there!
◦ Don't include any probability for them at all.
Why don't we build an unknown word model?
◦ It doesn't help: knowing which class has more unknown words is
not generally a useful thing to know!
Stop words
Some systems ignore another class of words:
Stop words: very frequent words like the and a.
◦ Sort the whole vocabulary by frequency in the training, call the
top 10 or 50 words the stopword list.
◦ Now we remove all stop words from the training and test sets
as if they were never there.
But in most text classification applications, removing
stop words don't help, so it's more common to not use
stopword lists and use all the words in naive Bayes.
Multinomial Naïve Bayes: Learning

• From training corpus, extract Vocabulary

Calculate P(cj) terms • Calculate P(wk | cj) terms
◦ For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj • For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
| docs j |
P(c j ) ¬ nk + a
| total # documents| P(wk | c j ) ¬
n + a | Vocabulary |
Text
Classification Naive Bayes: Learning
and Naive
Bayes
Text
Classification Sentiment and Binary
and Naive Naive Bayes
Bayes
Let's do a worked sentiment example!
Just
A worked sentiment example Plain
Boar
Entire
Predict
And 2
Lack
Energy
No
Surprise
Very
Few
lough
A worked sentiment example
Prior from training:
P(-) = 3/5
P(+) = 2/5
Drop "with"

Likelihoods from training:

Likelihoods from training:

Scoring the test set:
Optimizing for sentiment analysis
For tasks like sentiment, word occurrence is more
important than word frequency.
◦ The occurrence of the word fantastic tells us a lot
◦ The fact that it occurs 5 times may not tell us much more.
Binary multinominal naive bayes, or binary NB
◦ Clip our word counts at 1
◦ Note: this is different than Bernoulli naive bayes; see the
textbook at the end of the chapter.
Binary Multinomial Naïve Bayes: Learning
• From training corpus, extract Vocabulary
Calculate P(cj) terms • Calculate P(wk | cj) terms
◦ For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj • For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
| docs j |
P(c j ) ¬ nk + a
| total # documents| P(wk | c j ) ¬
n + a | Vocabulary |
• Remove duplicates in each doc:
• For each word type w in docj
• Retain only a single instance of w
Binary Multinomial Naive Bayes
on a test document d
First remove all duplicate words from d
Then compute NB using the same equation:

cNB = argmax P(c j )

c j ÎC
Õ P(wi | c j )
iÎ positions

63
Binary multinominal naive Bayes

Counts can still be 2! Binarization is within-doc!

Text
Classification Sentiment and Binary
and Naive Naive Bayes
Bayes
Text
Classification Naïve Bayes: Relationship
and Naïve to Language Modeling
Bayes
Generative Model for Multinomial Naïve Bayes

c=+

X1=I X2=love X3=this X4=fun X5=film

67
Naïve Bayes and Language Modeling
Naïve bayes classifiers can use any sort of feature
◦ URL, email address, dictionaries, network features
But if, as in the previous slides
◦ We use only word features
◦ we use all of the words in the text (not a subset)
Then
◦ Naive bayes has an important similarity to language
modeling.
68
Sec.13.2.1

Each class = a unigram language model

Assigning each word: P(word | c)

Assigning each sentence: P(s|c)=P(word|c)
Class pos
0.1 I
I love this fun film
0.1 love
0.1 0.1 .05 0.01 0.1
0.01 this
0.05 fun
0.1 film P(s | pos) = 0.0000005
Sec.13.2.1

Naïve Bayes as a Language Model

Which class assigns the higher probability to s?

Model pos Model neg

0.1 I 0.2 I I love this fun film
0.1 love 0.001 love
0.1 0.1 0.01 0.05 0.1
0.01 this 0.01 this 0.2 0.001 0.01 0.005 0.1

0.05 fun 0.005 fun

0.1 film 0.1 film P(s|pos) > P(s|neg)
Text
Classification Naïve Bayes: Relationship
and Naïve to Language Modeling
Bayes
Text
Classification Precision, Recall, and F
and Naïve measure
Bayes
Evaluation
Let's consider just binary text classification tasks
Imagine you're the CEO of Delicious Pie Company
You want to know what people are saying about
your pies
So you build a "Delicious Pie" tweet detector
◦ Positive class: tweets about Delicious Pie Co
◦ Negative class: all other tweets
The 2-by-2 confusion matrix
The 2-by-2 confusion matrix TP 10 FP 2

FN 3 TN 34
Evaluation: Accuracy
Why don't we use accuracy as our metric?
Imagine we saw 1 million tweets
◦ 100 of them talked about Delicious Pie Co.
◦ 999,900 talked about something else
We could build a dumb classifier that just labels every
tweet "not about pie"
◦ It would get 99.99% accuracy!!! Wow!!!!
◦ But useless! Doesn't return the comments we are looking for!
◦ That's why we use precision and recall instead
Evaluation: Precision
% of items the system detected (i.e., items the
system labeled as positive) that are in fact positive
(according to the human gold labels)
Evaluation: Recall
% of items actually present in the input that were
correctly identified by the system.
Why Precision and recall
Our dumb pie-classifier
◦ Just label nothing as "about pie"
Accuracy=99.99%
but
Recall = 0
◦ (it doesn't get any of the 100 Pie tweets)
Precision and recall, unlike accuracy, emphasize true
positives:
◦ finding the things that we are supposed to be looking for.
A combined measure: F
F measure: a single number that combines P and R:

We almost always use balanced F1 (i.e.,  = 1)

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
NB 24 Aug
No ratings yet
NB 24 Aug
85 pages
Slp3 TextClassification Reduced
No ratings yet
Slp3 TextClassification Reduced
60 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
Multimedia Application L7 - For
No ratings yet
Multimedia Application L7 - For
46 pages
Notes - Introduction To AI, ML, DS
No ratings yet
Notes - Introduction To AI, ML, DS
61 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
Steven Skiena-The Algorithm Design Manual-En
50% (2)
Steven Skiena-The Algorithm Design Manual-En
27 pages
Multinomial NB
No ratings yet
Multinomial NB
52 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
Naïve Bayes for Text Classification
No ratings yet
Naïve Bayes for Text Classification
25 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
04 - 1 06 Naivebayes
No ratings yet
04 - 1 06 Naivebayes
65 pages
NLP NB
No ratings yet
NLP NB
52 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
4.machine Learning For Text Understanding-1
No ratings yet
4.machine Learning For Text Understanding-1
45 pages
04-Textcat Text Class
No ratings yet
04-Textcat Text Class
77 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
T4L1 Naive Bayes
No ratings yet
T4L1 Naive Bayes
50 pages
Text Classification
No ratings yet
Text Classification
60 pages
Text Classification
No ratings yet
Text Classification
53 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
In4080 2022 Lecture 03
No ratings yet
In4080 2022 Lecture 03
62 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
Week 4
No ratings yet
Week 4
45 pages
Module 3 - NLP
No ratings yet
Module 3 - NLP
25 pages
Nlp4web Lecture 2 Text Classification
No ratings yet
Nlp4web Lecture 2 Text Classification
109 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
W2 3-NaiveBayes
No ratings yet
W2 3-NaiveBayes
17 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Classification
No ratings yet
Classification
81 pages
Text Classification & Naive Bayes
No ratings yet
Text Classification & Naive Bayes
4 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
17 pages
PhD Application & Text Classification
No ratings yet
PhD Application & Text Classification
41 pages
Machine Learning for Students
No ratings yet
Machine Learning for Students
30 pages
Inf2b Learn Note07 2up
No ratings yet
Inf2b Learn Note07 2up
5 pages
Baye's Notes
No ratings yet
Baye's Notes
3 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
An Approach of The Naive Bayes Classifier For The Document Classification
No ratings yet
An Approach of The Naive Bayes Classifier For The Document Classification
4 pages
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
No ratings yet
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
8 pages
NBayes 1 20 2011 Ann
No ratings yet
NBayes 1 20 2011 Ann
21 pages
Statistics
No ratings yet
Statistics
25 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
MLRD 2
No ratings yet
MLRD 2
15 pages
Naïve Bayes for CS Students
No ratings yet
Naïve Bayes for CS Students
55 pages
Text Classification
No ratings yet
Text Classification
7 pages
Automated Use Case Diagram Generator Using NLP and
No ratings yet
Automated Use Case Diagram Generator Using NLP and
5 pages
Naïve Bayes: The Task of Text Classification
No ratings yet
Naïve Bayes: The Task of Text Classification
34 pages
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
100% (2)
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
7 pages
Journal of Biomedical Informatics: Contents Lists Available at
No ratings yet
Journal of Biomedical Informatics: Contents Lists Available at
9 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
Classification of Bharatanatyam Postures Using Tailored Features and Artificial Neural Network
No ratings yet
Classification of Bharatanatyam Postures Using Tailored Features and Artificial Neural Network
10 pages
GPS Rossi2018
No ratings yet
GPS Rossi2018
15 pages
Naive Bayes and Sentiment Classification
No ratings yet
Naive Bayes and Sentiment Classification
23 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
Sentiment Analysis: Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis: Using Naïve Bayes Classifier
18 pages
MachineLearning Lecture06 PDF
No ratings yet
MachineLearning Lecture06 PDF
16 pages
Developing A ML Model Based Solution To Refine CAPTCHA
No ratings yet
Developing A ML Model Based Solution To Refine CAPTCHA
8 pages
ML Observation
No ratings yet
ML Observation
29 pages
Object Detection Report
No ratings yet
Object Detection Report
27 pages
Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv
No ratings yet
Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv
8 pages
Comparative Study of Customer Churn Prediction Based On Data Ensemble Approach
No ratings yet
Comparative Study of Customer Churn Prediction Based On Data Ensemble Approach
10 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Telco Churn Analysis for Students
No ratings yet
Telco Churn Analysis for Students
23 pages
Ai QP 1
No ratings yet
Ai QP 1
7 pages
Referencia N°06
No ratings yet
Referencia N°06
52 pages
Phishing Detection Using Clustering and Machine Learning
No ratings yet
Phishing Detection Using Clustering and Machine Learning
11 pages
Tencon23 274
No ratings yet
Tencon23 274
8 pages
KNN Classification with Scaling
No ratings yet
KNN Classification with Scaling
4 pages
Deep Learning for Emotion Detection
No ratings yet
Deep Learning for Emotion Detection
5 pages
Final 011
No ratings yet
Final 011
47 pages
AutoML vs ML in Mask Recognition
No ratings yet
AutoML vs ML in Mask Recognition
5 pages
Aircraft Identification
No ratings yet
Aircraft Identification
13 pages
Smart Complaint Management System: July 2018
No ratings yet
Smart Complaint Management System: July 2018
7 pages
ML for Software Bug Prediction
No ratings yet
ML for Software Bug Prediction
7 pages
Real-Time Plant Leaf Counting Using Deep Object de
No ratings yet
Real-Time Plant Leaf Counting Using Deep Object de
14 pages
Enhanced Bearing Fault Diagnosis Through Trees Ensemble Method and Feature Importance Analysis
No ratings yet
Enhanced Bearing Fault Diagnosis Through Trees Ensemble Method and Feature Importance Analysis
17 pages
Model Klasifikasi Multi Class
No ratings yet
Model Klasifikasi Multi Class
28 pages
Transformer For Object Detection Review and Benchmark
No ratings yet
Transformer For Object Detection Review and Benchmark
16 pages
Matjie - LK Final Report (201904606) V1
No ratings yet
Matjie - LK Final Report (201904606) V1
24 pages

Text Classification

Uploaded by

Text Classification

Uploaded by

Naive Bayes

and The Task of Text

James Madison Alexander Hamilton

+ ...zany characters and richly applied satire, and some great

− It was pathetic. The worst part about it was the boxing

− ...awful pizza and ridiculously overpriced...

+ ...zany characters and richly applied satire, and some great

− It was pathetic. The worst part about it was the boxing

− ...awful pizza and ridiculously overpriced...

Movie: is this review positive or negative?

Sentiment analysis is the detection of

Output: a predicted class c  C

Rules based on combinations of words or other features

Simple (“naive”) classification method based on

•For a document d and a class c

cMAP = argmax P(c | d) MAP is “maximum a

cMAP = argmax P(d | c)P(c)

= argmax P(x1, x2,… , xn | c)P(c) represented as

cMAP = argmax P(x1, x2,… , xn | c)P(c)

O(|X|n•|C|) parameters How often does this

Could only be estimated if a

P(x1,… , xn | c) = P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

cMAP = argmax P(x1, x2,… , xn | c)P(c)

cNB = argmax P(c j )Õ P(x | c)

positions  all word positions in test document

cNB = argmax P(c j )

Multiplying lots of probabilities can result in floating-point

Learning the Multinomial Naive Bayes Model

First attempt: maximum likelihood estimates

Learning the Multinomial Naive Bayes Model

count(wi , c j ) fraction of times word wi appears

Create mega-document for topic j by concatenating all

Normal (17) Spam (7)

P ( Normal| “Lunch Money”) = (3/17) * (1/17) * (8/12)

Problem with Maximum Likelihood

What if we have seen no training documents with the word fantastic

Zero probabilities cannot be conditioned away, no matter the other

Unique Word = 4, Number of occurrence = Spam

• From training corpus, extract Vocabulary

Likelihoods from training:

Likelihoods from training:

cNB = argmax P(c j )

Counts can still be 2! Binarization is within-doc!

X1=I X2=love X3=this X4=fun X5=film

Each class = a unigram language model

Assigning each word: P(word | c)

Naïve Bayes as a Language Model

Which class assigns the higher probability to s?

Model pos Model neg

0.05 fun 0.005 fun

We almost always use balanced F1 (i.e.,  = 1)

You might also like