Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views38 pages

Text Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views38 pages

Text Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Text Classification

Nakul Dave
Assistant Professor
Computer Engineering Department
Vishwakarma Government Engineering College - Ahmedabad

Naku l Dave T ext Classification 1 / 38


Table of contents
1 Introduction
Examples of classification
2 Text Classification Phases
Feature Extraction
3 Problem Formulation
4 Methods of Text Classification
Naïve Bayes classification
Smoothing
A Worked Example
Model Training and Testing
5 Types and Evaluation
Types
Evaluation
6 Questions
7 References
Naku l Dave T ext Classification 2 / 38
Introduction

Text Classification

Text Classification is a fundamental task in NLP that involves


categorizing text documents into predefined classes or categories.
It enables automated analysis and organization of large amounts
of textual data.
Examples of text classification tasks:
Sentiment Analysis: Classifying text as positive, negative, or
neutral.
Topic Classification: Assigning documents to specific topics or
themes.
Spam Detection: Identifying spam emails or messages.

Naku l Dave T ext Classification 3 / 38


Introduction

What is Text Classification

Figure: text Classification Use Cases1

1
https://www.sketchbubble.com/en/presentation-text-classification.html
Naku l Dave T ext Classification 4 / 38
Introduction

Text Classification Applications

Figure: Text Classification Applications2

2
https://www.sketchbubble.com/en/presentation-text-classification.html
Naku l Dave T ext Classification 5
5 // 38
38
Introduction E xamp les of classification

Example of Movie Reviews

“Heartfelt and emotional, This is a must-watch.”

“Disappointing entry in the franchise, lacking the thrill and


intrigue of its predecessors. The plot is convoluted, and the
dialogues are uninspiring.”

“This movie is a weak romantic comedy that fails to make a


lasting impression.”

“This is the greatest comedy movie ever filmed.”

Naku l Dave T ext Classification 6 / 38


Introduction E xamp les of classification

What is the subject of this document?

Figure: Document Classifier3


3
https:
//www.pericent.com/products/docedge-dms/hot-features/classification/
Naku l Dave T ext Classification 7 / 38
Text Classification Phases

Text Classification Pipeline

1 Data Preprocessing: Cleaning and preparing the text data.


2 Feature Extraction: Converting text into numerical features.
3 Model Training: Building a classifier using labeled training data.
4 Model Evaluation: Assessing the performance of the trained
model.
5 Prediction: Applying the trained model to classify new, unseen
text data.

Naku l Dave T ext Classification 8 / 38


Text Classification Phases Featu re Extraction

Feature Extraction: Bag-of-Words

Bag-of-Words (BoW) is a popular technique for feature extraction


in text classification.
It represents text as a collection of word counts or
presence/absence indicators.
Each unique word in the corpus becomes a feature or dimension in
the vector space.
BoW ignores word order and only considers the frequency or
presence of words.

Naku l Dave T ext Classification 9 / 38


Text Classification Phases Featu re Extraction

Feature Extraction: Example

Let’s consider two movie reviews:


Review 1: ”The movie was amazing and captivating.”
Review 2: ”I didn’t like the movie. It was boring.”
After preprocessing, the reviews become:
Review 1: ”movie amazing captivating”
Review 2: ”didn’t like movie boring”
Using BoW, we can represent the reviews as feature vectors:
Review 1: [1, 1, 1] (indicating the presence of ”movie”, ”amazing”,
and ”captivating”)
Review 2: [1, 0, 1] (indicating the presence of ”movie” and ”boring”)

Naku l Dave T ext Classification 10 / 38


P roblem Formulation

Text Classification: A Problem Formulation

Input
A document d
A fix set of class C = {c 1 , c2, · · · , c n }

Output
A predicted Class c ∈ C

Naku l Dave T ext Classification 11 / 38


Meth od s of Text Classification

Classification Methods: Hand-coded rules

Rules are framed based on the features or words occurs in the text.
Number of rules depends on the features and size of text.

Spam
black-list-address OR (“dollars” AND “have been selected”)

Pros and Cons


Accuracy can be high if rules are carefully refined by experts, but
building and maintaining these rules is expensive.

Naku l Dave T ext Classification 12 / 38


Meth od s of Text Classification

Classification Methods: Supervised Machine Learning

Naive Bayes
Decision Tree Induction

Support Vector Machine

Logistic Regression

···

Naku l Dave T ext Classification 13 / 38


Meth od s of Text Classification Naïve B ay es classification

Naive Bayes Classification

Naive Bayes is a probabilistic classification algorithm based on


Bayes’ theorem.
It assumes that the features are conditionally independent given
the class label.
Naive Bayes is commonly used for text classification tasks.

Naku l Dave T ext Classification 14 / 38


Meth od s of Text Classification Naïve B ay es classification

Bayes’ Theorem

Bayes’ theorem is defined as:

P(d|C) · P(C)
P(C|d) =
P(d)
Where:
P(d|C) is the posterior probability of class d given the features C.
P(C|d) is the likelihood of the features C given class d.
P(d) is the prior probability of class d.
P(C) is the probability of the features C.

Naku l Dave T ext Classification 15 / 38


Meth od s of Text Classification Naïve B ay es classification

Naive Bayes Algorithm

1 Calculate the prior probabilities P(d) for each class d in the


training data.
2 For each feature in the input, calculate the likelihood P(C|d) for
each class d based on the training data.
3 Multiply the prior probabilities and likelihoods to obtain the joint
probability P(C, d) for each class d.
4 Normalize the joint probabilities by dividing them by the evidence
P(C) to obtain the posterior probabilities P(d|C).
5 Assign the input to the class with the highest posterior probability.

Naku l Dave T ext Classification 16 / 38


Meth od s of Text Classification Naïve B ay es classification

Argmax Notation

Bayes’ theorem
P(d|C) · P(C)
P(C|d) =
P(d)

For Naive Bayes classification, the argmax notation can be expressed


as:
Naïve Bayes Classifier

ˆc = arg max P(c|d)


c∈C
= arg max P(d|c)P(c)
c∈C
= arg max P(x1, x2, . . . , xn|c)P(c)
c∈C

Naku l Dave T ext Classification 17 / 38


Meth od s of Text Classification Naïve B ay es classification

Naïve Bayes classification assumptions

P(x1, x2, . . . , xn|c)

Bag of Word Assumption


Assume that the position of a word in the document doesn’t matter

Conditional Independence
P(x1, x2, . . . , xn|c) = P(x1|c)P(x2|c) . . . P(xn|c)


ˆcNB = arg max P(c) P(x|c)
c∈C
x∈X

Naku l Dave T ext Classification 18 / 38


Meth od s of Text Classification Naïve B ay es classification

Learning the model parameters

Maximum Likelihood Estimate

count(C = cj )
P̂(c j ) =
Ndoc
count(wi; cj )
P̂(wij |c j ) = ∑
w∈V count(w; c j )

Problem with MLE


Suppose in the training data, we haven’t seen the word “Awesome”,
classified in the topic ‘positive’.

P̂(Awesome|positive) = 0

ˆcNB = arg max P(c) P(x|c)
c∈C
x∈X

Naku l Dave T ext Classification 19 / 38


Meth od s of Text Classification Smoothing

Laplace (add-1) Smoothing

Laplace Smoothing

count(wi; c) + 1
P̂(wij |c) = ∑
w∈V(count(w; c) + 1)
count(wi; c) + 1
= ∑
w∈V count(w; c) + |V|

Naku l Dave T ext Classification 20 / 38


Meth od s of Text Classification A Worked E xamp le

A Worked Example

Naku l Dave T ext Classification 21 / 38


Meth od s of Text Classification Model Training an d Testing

Model Training

Once the text data is transformed into numerical features, we can


train a text classification model.
Popular algorithms for text classification include:
Naive Bayes: Based on Bayes’ theorem, assumes independence
between features.
Support Vector Machines (SVM): Constructs hyperplanes to
separate different classes.
Decision Trees: Hierarchical structure of if-else rules for
classification.
Neural Networks: Deep learning models with multiple layers for
feature learning.
We train the model on a labeled dataset, where each review is
associated with its sentiment label.

Naku l Dave T ext Classification 22 / 38


Meth od s of Text Classification Model Training an d Testing

Model Evaluation

To assess the performance of the text classification model, we


evaluate it using appropriate metrics.
Common evaluation metrics for text classification include
accuracy, precision, recall, and F1-score.
We split our labeled dataset into training and test sets, and
evaluate the model on the test set.
The evaluation results provide insights into the model’s ability to
generalize and classify new, unseen data.

Naku l Dave T ext Classification 23 / 38


Meth od s of Text Classification Model Training an d Testing

Prediction

After training and evaluating the model, we can use it to predict


the sentiment of new, unseen movie reviews.
We preprocess the new text data, extract features using the same
technique (e.g., BoW), and feed it to the trained model.
The model assigns a sentiment label (positive or negative) to each
new review based on its learned patterns.
The predictions can be used for various applications, such as
recommendation systems or sentiment analysis dashboards.

Naku l Dave T ext Classification 24 / 38


T y p e s an d Evalu ation Ty p es

Types of classification

Binary classification - Two Classes only


Yes, No
Positive, Negative

Multinomial classification - More than two classes


Example - High, Low, Medium
Example - Very Poor, Poor, Average, Good, Very Good, Excellent

Multi-value classification - document can belong to 0, 1 or > 1


classes

Naku l Dave T ext Classification 25 / 38


T y p e s an d Evalu ation Ty p es

Naïve Bayes: More than Two Classes

Multi-value classification
A document can belong to 0, 1 or > 1 classes

Handling Multi-value classification


For each class c ∈ C, build a classifier γc to distinguish c from all
other classes c′ ∈ C
Given test-doc d, evaluate it for membership in each class using
each γc
d belongs to any class for which γc returns true

Naku l Dave T ext Classification 26 / 38


T y p e s an d Evalu ation Evalu ation

Evaluation of Text Classification

Accuracy measures the proportion of correctly classified instances


out of all instances in the dataset.

Precision measures the proportion of true positive instances out of


all instances classified as positive.

Recall (Sensitivity) measures the proportion of true positive


instances out of all actual positive instances.

Naku l Dave T ext Classification 27 / 38


T y p e s an d Evalu ation Evalu ation

Confusion Matrix - Two Classes

The confusion matrix displays the number of true positive, true


negative, false positive, and false negative predictions.
It provides a more detailed view of the model’s performance for
each class.

Figure: Two-class Confusion Matrix4

4
Image Courtesy - https://www.arxiv-vanity.com/papers/2008.05756/
Naku l Dave T ext Classification 28 / 38
T y p e s an d Evalu ation Evalu ation

Precision

Precision
The Precision is the fraction of True Positive elements divided by
the total number of positively predicted units (column sum of the
predicted positives).
In particular, True Positive are the elements that have been
labeled as positive by the model and they are actually positive,
while False Positive are the elements that have been labeled as
positive by the model, but they are actually negative.

Let’s Calculate Precision


TP(20)
TP(20)+FP(10) = 0.66

Naku l Dave T ext Classification 29 / 38


T y p e s an d Evalu ation Evalu ation

Recall

Recall
The Recall is the fraction of True Positive elements divided by the
total number of positively classified units (row sum of the actual
positives).
In particular False Negative are the elements that have been
labeled as negative by the model, but they are actually positive.

Let’s Calculate Recall


TP(20)
TP(20)+FN(05) = 0.80

Naku l Dave T ext Classification 30 / 38


T y p e s an d Evalu ation Evalu ation

Accuracy

Accuracy
Accuracy is one of the most popular metrics in multi-class classification
and it is directly computed from the confusion matrix.

Let’s Calculate Accuracy

TP + TN
TP + FP + TN + FN

Naku l Dave T ext Classification 31 / 38


T y p e s an d Evalu ation Evalu ation

Confusion Matrix - More than two classes

Figure: Multi class Confusion Matrix5

5
Image Courtesy - https://www.arxiv-vanity.com/papers/2008.05756/
Naku l Dave T ext Classification 32 / 38
T y p e s an d Evalu ation Evalu ation

Confusion Matrix - More than two classes

Figure: Multi class Confusion Matrix6

Naku l Dave T ext Classification 33 / 38


T y p e s an d Evalu ation Evalu ation

Micro- vs. Macro-Average

If we have more than one class, how do we combine multiple


performance measures into one quantity?
Macro-averaging
Compute performance for each class, then average

Micro-averaging
Collect decisions for all the classes, compute contingency table, and
evaluate.

Naku l Dave T ext Classification 34 / 38


T y p e s an d Evalu ation Evalu ation

Confusion Matrix - More than two classes

Figure: Macro Vs. Micro averaged precision

Naku l Dave T ext Classification 35 / 38


Questions

Questions?

Naku l Dave T ext Classification 36 / 38


Questions

Thank You All.......

Naku l Dave T ext Classification 37 / 38


References

References I

Naku l Dave T ext Classification 38 / 38

You might also like