Text Classification
Nakul Dave
Assistant Professor
Computer Engineering Department
Vishwakarma Government Engineering College - Ahmedabad
Naku l Dave T ext Classification 1 / 38
Table of contents
1 Introduction
Examples of classification
2 Text Classification Phases
Feature Extraction
3 Problem Formulation
4 Methods of Text Classification
Naïve Bayes classification
Smoothing
A Worked Example
Model Training and Testing
5 Types and Evaluation
Types
Evaluation
6 Questions
7 References
Naku l Dave T ext Classification 2 / 38
Introduction
Text Classification
Text Classification is a fundamental task in NLP that involves
categorizing text documents into predefined classes or categories.
It enables automated analysis and organization of large amounts
of textual data.
Examples of text classification tasks:
Sentiment Analysis: Classifying text as positive, negative, or
neutral.
Topic Classification: Assigning documents to specific topics or
themes.
Spam Detection: Identifying spam emails or messages.
Naku l Dave T ext Classification 3 / 38
Introduction
What is Text Classification
Figure: text Classification Use Cases1
1
https://www.sketchbubble.com/en/presentation-text-classification.html
Naku l Dave T ext Classification 4 / 38
Introduction
Text Classification Applications
Figure: Text Classification Applications2
2
https://www.sketchbubble.com/en/presentation-text-classification.html
Naku l Dave T ext Classification 5
5 // 38
38
Introduction E xamp les of classification
Example of Movie Reviews
“Heartfelt and emotional, This is a must-watch.”
“Disappointing entry in the franchise, lacking the thrill and
intrigue of its predecessors. The plot is convoluted, and the
dialogues are uninspiring.”
“This movie is a weak romantic comedy that fails to make a
lasting impression.”
“This is the greatest comedy movie ever filmed.”
Naku l Dave T ext Classification 6 / 38
Introduction E xamp les of classification
What is the subject of this document?
Figure: Document Classifier3
3
https:
//www.pericent.com/products/docedge-dms/hot-features/classification/
Naku l Dave T ext Classification 7 / 38
Text Classification Phases
Text Classification Pipeline
1 Data Preprocessing: Cleaning and preparing the text data.
2 Feature Extraction: Converting text into numerical features.
3 Model Training: Building a classifier using labeled training data.
4 Model Evaluation: Assessing the performance of the trained
model.
5 Prediction: Applying the trained model to classify new, unseen
text data.
Naku l Dave T ext Classification 8 / 38
Text Classification Phases Featu re Extraction
Feature Extraction: Bag-of-Words
Bag-of-Words (BoW) is a popular technique for feature extraction
in text classification.
It represents text as a collection of word counts or
presence/absence indicators.
Each unique word in the corpus becomes a feature or dimension in
the vector space.
BoW ignores word order and only considers the frequency or
presence of words.
Naku l Dave T ext Classification 9 / 38
Text Classification Phases Featu re Extraction
Feature Extraction: Example
Let’s consider two movie reviews:
Review 1: ”The movie was amazing and captivating.”
Review 2: ”I didn’t like the movie. It was boring.”
After preprocessing, the reviews become:
Review 1: ”movie amazing captivating”
Review 2: ”didn’t like movie boring”
Using BoW, we can represent the reviews as feature vectors:
Review 1: [1, 1, 1] (indicating the presence of ”movie”, ”amazing”,
and ”captivating”)
Review 2: [1, 0, 1] (indicating the presence of ”movie” and ”boring”)
Naku l Dave T ext Classification 10 / 38
P roblem Formulation
Text Classification: A Problem Formulation
Input
A document d
A fix set of class C = {c 1 , c2, · · · , c n }
Output
A predicted Class c ∈ C
Naku l Dave T ext Classification 11 / 38
Meth od s of Text Classification
Classification Methods: Hand-coded rules
Rules are framed based on the features or words occurs in the text.
Number of rules depends on the features and size of text.
Spam
black-list-address OR (“dollars” AND “have been selected”)
Pros and Cons
Accuracy can be high if rules are carefully refined by experts, but
building and maintaining these rules is expensive.
Naku l Dave T ext Classification 12 / 38
Meth od s of Text Classification
Classification Methods: Supervised Machine Learning
Naive Bayes
Decision Tree Induction
Support Vector Machine
Logistic Regression
···
Naku l Dave T ext Classification 13 / 38
Meth od s of Text Classification Naïve B ay es classification
Naive Bayes Classification
Naive Bayes is a probabilistic classification algorithm based on
Bayes’ theorem.
It assumes that the features are conditionally independent given
the class label.
Naive Bayes is commonly used for text classification tasks.
Naku l Dave T ext Classification 14 / 38
Meth od s of Text Classification Naïve B ay es classification
Bayes’ Theorem
Bayes’ theorem is defined as:
P(d|C) · P(C)
P(C|d) =
P(d)
Where:
P(d|C) is the posterior probability of class d given the features C.
P(C|d) is the likelihood of the features C given class d.
P(d) is the prior probability of class d.
P(C) is the probability of the features C.
Naku l Dave T ext Classification 15 / 38
Meth od s of Text Classification Naïve B ay es classification
Naive Bayes Algorithm
1 Calculate the prior probabilities P(d) for each class d in the
training data.
2 For each feature in the input, calculate the likelihood P(C|d) for
each class d based on the training data.
3 Multiply the prior probabilities and likelihoods to obtain the joint
probability P(C, d) for each class d.
4 Normalize the joint probabilities by dividing them by the evidence
P(C) to obtain the posterior probabilities P(d|C).
5 Assign the input to the class with the highest posterior probability.
Naku l Dave T ext Classification 16 / 38
Meth od s of Text Classification Naïve B ay es classification
Argmax Notation
Bayes’ theorem
P(d|C) · P(C)
P(C|d) =
P(d)
For Naive Bayes classification, the argmax notation can be expressed
as:
Naïve Bayes Classifier
ˆc = arg max P(c|d)
c∈C
= arg max P(d|c)P(c)
c∈C
= arg max P(x1, x2, . . . , xn|c)P(c)
c∈C
Naku l Dave T ext Classification 17 / 38
Meth od s of Text Classification Naïve B ay es classification
Naïve Bayes classification assumptions
P(x1, x2, . . . , xn|c)
Bag of Word Assumption
Assume that the position of a word in the document doesn’t matter
Conditional Independence
P(x1, x2, . . . , xn|c) = P(x1|c)P(x2|c) . . . P(xn|c)
∏
ˆcNB = arg max P(c) P(x|c)
c∈C
x∈X
Naku l Dave T ext Classification 18 / 38
Meth od s of Text Classification Naïve B ay es classification
Learning the model parameters
Maximum Likelihood Estimate
count(C = cj )
P̂(c j ) =
Ndoc
count(wi; cj )
P̂(wij |c j ) = ∑
w∈V count(w; c j )
Problem with MLE
Suppose in the training data, we haven’t seen the word “Awesome”,
classified in the topic ‘positive’.
P̂(Awesome|positive) = 0
∏
ˆcNB = arg max P(c) P(x|c)
c∈C
x∈X
Naku l Dave T ext Classification 19 / 38
Meth od s of Text Classification Smoothing
Laplace (add-1) Smoothing
Laplace Smoothing
count(wi; c) + 1
P̂(wij |c) = ∑
w∈V(count(w; c) + 1)
count(wi; c) + 1
= ∑
w∈V count(w; c) + |V|
Naku l Dave T ext Classification 20 / 38
Meth od s of Text Classification A Worked E xamp le
A Worked Example
Naku l Dave T ext Classification 21 / 38
Meth od s of Text Classification Model Training an d Testing
Model Training
Once the text data is transformed into numerical features, we can
train a text classification model.
Popular algorithms for text classification include:
Naive Bayes: Based on Bayes’ theorem, assumes independence
between features.
Support Vector Machines (SVM): Constructs hyperplanes to
separate different classes.
Decision Trees: Hierarchical structure of if-else rules for
classification.
Neural Networks: Deep learning models with multiple layers for
feature learning.
We train the model on a labeled dataset, where each review is
associated with its sentiment label.
Naku l Dave T ext Classification 22 / 38
Meth od s of Text Classification Model Training an d Testing
Model Evaluation
To assess the performance of the text classification model, we
evaluate it using appropriate metrics.
Common evaluation metrics for text classification include
accuracy, precision, recall, and F1-score.
We split our labeled dataset into training and test sets, and
evaluate the model on the test set.
The evaluation results provide insights into the model’s ability to
generalize and classify new, unseen data.
Naku l Dave T ext Classification 23 / 38
Meth od s of Text Classification Model Training an d Testing
Prediction
After training and evaluating the model, we can use it to predict
the sentiment of new, unseen movie reviews.
We preprocess the new text data, extract features using the same
technique (e.g., BoW), and feed it to the trained model.
The model assigns a sentiment label (positive or negative) to each
new review based on its learned patterns.
The predictions can be used for various applications, such as
recommendation systems or sentiment analysis dashboards.
Naku l Dave T ext Classification 24 / 38
T y p e s an d Evalu ation Ty p es
Types of classification
Binary classification - Two Classes only
Yes, No
Positive, Negative
Multinomial classification - More than two classes
Example - High, Low, Medium
Example - Very Poor, Poor, Average, Good, Very Good, Excellent
Multi-value classification - document can belong to 0, 1 or > 1
classes
Naku l Dave T ext Classification 25 / 38
T y p e s an d Evalu ation Ty p es
Naïve Bayes: More than Two Classes
Multi-value classification
A document can belong to 0, 1 or > 1 classes
Handling Multi-value classification
For each class c ∈ C, build a classifier γc to distinguish c from all
other classes c′ ∈ C
Given test-doc d, evaluate it for membership in each class using
each γc
d belongs to any class for which γc returns true
Naku l Dave T ext Classification 26 / 38
T y p e s an d Evalu ation Evalu ation
Evaluation of Text Classification
Accuracy measures the proportion of correctly classified instances
out of all instances in the dataset.
Precision measures the proportion of true positive instances out of
all instances classified as positive.
Recall (Sensitivity) measures the proportion of true positive
instances out of all actual positive instances.
Naku l Dave T ext Classification 27 / 38
T y p e s an d Evalu ation Evalu ation
Confusion Matrix - Two Classes
The confusion matrix displays the number of true positive, true
negative, false positive, and false negative predictions.
It provides a more detailed view of the model’s performance for
each class.
Figure: Two-class Confusion Matrix4
4
Image Courtesy - https://www.arxiv-vanity.com/papers/2008.05756/
Naku l Dave T ext Classification 28 / 38
T y p e s an d Evalu ation Evalu ation
Precision
Precision
The Precision is the fraction of True Positive elements divided by
the total number of positively predicted units (column sum of the
predicted positives).
In particular, True Positive are the elements that have been
labeled as positive by the model and they are actually positive,
while False Positive are the elements that have been labeled as
positive by the model, but they are actually negative.
Let’s Calculate Precision
TP(20)
TP(20)+FP(10) = 0.66
Naku l Dave T ext Classification 29 / 38
T y p e s an d Evalu ation Evalu ation
Recall
Recall
The Recall is the fraction of True Positive elements divided by the
total number of positively classified units (row sum of the actual
positives).
In particular False Negative are the elements that have been
labeled as negative by the model, but they are actually positive.
Let’s Calculate Recall
TP(20)
TP(20)+FN(05) = 0.80
Naku l Dave T ext Classification 30 / 38
T y p e s an d Evalu ation Evalu ation
Accuracy
Accuracy
Accuracy is one of the most popular metrics in multi-class classification
and it is directly computed from the confusion matrix.
Let’s Calculate Accuracy
TP + TN
TP + FP + TN + FN
Naku l Dave T ext Classification 31 / 38
T y p e s an d Evalu ation Evalu ation
Confusion Matrix - More than two classes
Figure: Multi class Confusion Matrix5
5
Image Courtesy - https://www.arxiv-vanity.com/papers/2008.05756/
Naku l Dave T ext Classification 32 / 38
T y p e s an d Evalu ation Evalu ation
Confusion Matrix - More than two classes
Figure: Multi class Confusion Matrix6
Naku l Dave T ext Classification 33 / 38
T y p e s an d Evalu ation Evalu ation
Micro- vs. Macro-Average
If we have more than one class, how do we combine multiple
performance measures into one quantity?
Macro-averaging
Compute performance for each class, then average
Micro-averaging
Collect decisions for all the classes, compute contingency table, and
evaluate.
Naku l Dave T ext Classification 34 / 38
T y p e s an d Evalu ation Evalu ation
Confusion Matrix - More than two classes
Figure: Macro Vs. Micro averaged precision
Naku l Dave T ext Classification 35 / 38
Questions
Questions?
Naku l Dave T ext Classification 36 / 38
Questions
Thank You All.......
Naku l Dave T ext Classification 37 / 38
References
References I
Naku l Dave T ext Classification 38 / 38