0% found this document useful (0 votes)

19 views60 pages

Text Classification

The document outlines the content of Lecture 3 for COMP 3361 Natural Language Processing, focusing on text classification techniques such as Naive Bayes and logistic regression. It includes announcements about assignments, course resources, and a survey link, along with a lecture plan covering language modeling and text classification methods. The lecture also discusses the advantages and disadvantages of Naive Bayes compared to more complex models like ChatGPT.

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views60 pages

Text Classification

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

COMP 3361 Natural Language Processing

Lecture 3: Text Classification

Spring 2024

Many materials from CS224n@Stanford and COS484@Princeton with special thanks!

Announcements
● Assignment 1 has been released (due in 4 weeks: 9:00 am, Feb 20)
● Once more, please sign up for the course's Slack workspace. This is included in your
class participation grade.
https://join.slack.com/t/slack-fdv4728/shared_invite/zt-2asgddr0h-6wIXbRndwKhBw2IX2~ZrJQ

● You should be able to access the course Moodle page now.

● The course page has updated details on the tentative schedule

Google form survey

https://forms.gle/FMQvFCuzUyJ3pB93A
Lecture plan
● Recap of language modeling
● Naive Bayes and sentiment classification
● Logistic Regression for text classification
Generating from language models
● Deterministic approach: Temperature=0, always selects the word with the highest
probability in each iteration

How ChatGPT completes a sentence with temperature=0

https://www.atmosera.com/ai/understanding-chatgpt/
Generating from language models
● Probabilistic or stochastic approach: e.g., temperature=0.7, the next word is chosen
based on a probability distribution over the possible words. More creative!

How ChatGPT completes a sentence with temperature=0.7

https://www.atmosera.com/ai/understanding-chatgpt/
Why text classification?

Spam email detection Sentiment analysis

Q: any other examples?

Text classification
Prompting ChatGPT for text classification
Prompting ChatGPT for text classification

Parse ChatGPT’s output

Rule-based text classification
IF there exists word w in document d such that w in [good, great, extra-ordinary, …],
THEN output Positive
IF email address ends in [ithelpdesk.com, makemoney.com, spinthewheel.com, …]
THEN output SPAM

● + Can be very accurate (if rules carefully refined by expert)

● - Rules may be hard to define (and some even unknown to us!)
● - Labor intensive and expensive
● - Hard to generalize and keep up-to-date
Supervised Learning: Let’s use statistics!
Let the machine figure out the best patterns using data

Key questions:
● What is the form of F?
● How do we learn F?
Types of supervised classifiers

Logistic regression
Naive Bayes

Support vector machines Neural networks

Naive Bayes

Naive Bayes
Naive Bayes classifier
Simple classification model making use of Bayes rule
● Bayes rule:
Naive Bayes classifier
Naive Bayes classifier
How to represent ?
● Option 1: represent the entire sequence of words
○ Too many sequences!
How to represent ?
● Option 1: represent the entire sequence of words
○ Too many sequences!
● Option 2: Bag of words

○ Assume position of each word doesn’t matter

○ Probability of each word is conditionally independent of the other words given
class c
Bag of words (BoW)
Predicting with Naive Bayes
How to estimate probabilities?
Data sparsity problem

��
This sounds familiar…
Solution: Smoothing!
Overall process
Overall process
Overall process
A worked example for sentiment analysis
A worked example for sentiment analysis
A worked example for sentiment analysis
Naive Bayes vs. language models
Naive Bayes vs. language models
Naive Bayes vs. language models
Naive Bayes vs. language models
Naive Bayes vs. language models
Naive Bayes vs. language models
Naive Bayes: pros and cons
Naive Bayes can use any features!

● In general, Naive Bayes can use

any set of features, not just
words:
○ URLs, email addresses,
Capitalization, …
○ Domain knowledge crucial
to performance

Top features for spam detection

Wait, we already have ChatGPT, why still NB?

Naive Bayes Transformers, neural networks and many others

e.g., ChatGPT
Wait, we already have ChatGPT, why still NB?
● Computational efficiency, cost
● Simplicity and interpretability
● Small data performance
● Out of domain
○ Requires domain experts to design
features
● …

Naive Bayes Transformers, neural networks and many others

e.g., ChatGPT
Logistic regression

Logistic regression

Study yourself!
Logistic regression

https://machine-learning.paperspace.com/wiki/logistic-regression
Generative vs. discriminative models
Generative classifiers
Discriminative classifiers
Overall process: Discriminative classifiers
1. Feature representation

Bag of words
Example: Sentiment classification
2. Classification function
Example: Sentiment classification
3. Loss function
Example: Computing CE loss
Properties of CE loss
Properties of CE loss
4. Optimization
Gradient for logistic regression
Regularization
Multinomial logistic regression
Features in multinomial LR
Learning
Next lecture: word embeddings

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
NLP Essentials
No ratings yet
NLP Essentials
22 pages
NB 24 Aug
No ratings yet
NB 24 Aug
85 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Multinomial NB
No ratings yet
Multinomial NB
52 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
Multimedia Application L7 - For
No ratings yet
Multimedia Application L7 - For
46 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
Module 3 - NLP
No ratings yet
Module 3 - NLP
25 pages
In4080 2022 Lecture 03
No ratings yet
In4080 2022 Lecture 03
62 pages
18eln mergedPDFdocs PDF
100% (1)
18eln mergedPDFdocs PDF
125 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
Text Classification & Naive Bayes
No ratings yet
Text Classification & Naive Bayes
4 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
No ratings yet
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
31 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
Week 4
No ratings yet
Week 4
45 pages
Airline Tweets Classification Using Naive Bayes Classifier
No ratings yet
Airline Tweets Classification Using Naive Bayes Classifier
2 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Text Classification Guide & Datasets
No ratings yet
Text Classification Guide & Datasets
24 pages
AI & Machine Learning Course Guide
No ratings yet
AI & Machine Learning Course Guide
47 pages
NLP NB
No ratings yet
NLP NB
52 pages
Irs Lab Week-4
No ratings yet
Irs Lab Week-4
2 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
Classification
No ratings yet
Classification
81 pages
4.machine Learning For Text Understanding-1
No ratings yet
4.machine Learning For Text Understanding-1
45 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Lab 08 - Supervised Text Classification-Part 1
No ratings yet
Lab 08 - Supervised Text Classification-Part 1
6 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
Text Classification
No ratings yet
Text Classification
11 pages
Lect 05
No ratings yet
Lect 05
17 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Naive Bayes Sentiment Analysis
No ratings yet
Naive Bayes Sentiment Analysis
23 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
7 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
Lecture Feb20&25
No ratings yet
Lecture Feb20&25
11 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Text Classification
No ratings yet
Text Classification
7 pages
MLRD 2
No ratings yet
MLRD 2
15 pages
Fractal Geometry and Superformula To Model Natural Shapes Over The World
No ratings yet
Fractal Geometry and Superformula To Model Natural Shapes Over The World
15 pages
NLP Text Classification Week4
No ratings yet
NLP Text Classification Week4
26 pages
Ai Lecture22
No ratings yet
Ai Lecture22
32 pages
Machen e Learning
No ratings yet
Machen e Learning
9 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
Naive Bayes and Sentiment Classification
No ratings yet
Naive Bayes and Sentiment Classification
23 pages
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
27 pages
02 Text Processing PDF
No ratings yet
02 Text Processing PDF
70 pages
Statistical Learning and Text Classification With NLTK and Scikit-Learn
No ratings yet
Statistical Learning and Text Classification With NLTK and Scikit-Learn
24 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
A Guide To Text Classification (NLP)
No ratings yet
A Guide To Text Classification (NLP)
17 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
DLL For Observation Edited
100% (1)
DLL For Observation Edited
3 pages
Flux AI Image Generator Using N8n.io OpenAI
No ratings yet
Flux AI Image Generator Using N8n.io OpenAI
17 pages
Which Chart or Graph Is Right For You? Tell Impactful Stories With Data
No ratings yet
Which Chart or Graph Is Right For You? Tell Impactful Stories With Data
14 pages
Laser Spectroscopy Basic Concepts and Instrumentation 3rd Ed Wolfgang Demtrder PDF Download
100% (1)
Laser Spectroscopy Basic Concepts and Instrumentation 3rd Ed Wolfgang Demtrder PDF Download
16 pages
1 Solution
No ratings yet
1 Solution
3 pages
S1-K12 Laser Service Manual
No ratings yet
S1-K12 Laser Service Manual
10 pages
Datadgeling
No ratings yet
Datadgeling
22 pages
Server Administration and Management
No ratings yet
Server Administration and Management
3 pages
E3. AI Agents
No ratings yet
E3. AI Agents
49 pages
Term Paper On Management Information System
100% (1)
Term Paper On Management Information System
4 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
Data Acquisition in MATLAB
No ratings yet
Data Acquisition in MATLAB
27 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Destination Management System
100% (1)
Destination Management System
14 pages
Quiz - Cloud Security - Revisão Da Tentativa - Training Institute - PDF 3
No ratings yet
Quiz - Cloud Security - Revisão Da Tentativa - Training Institute - PDF 3
2 pages
Concurrent Managers Not Working Check This
No ratings yet
Concurrent Managers Not Working Check This
16 pages
E-Guard: Home Security for Cairo
No ratings yet
E-Guard: Home Security for Cairo
23 pages
Neural Language Models & Tokenization
No ratings yet
Neural Language Models & Tokenization
70 pages
Matrices and Linear Transformations
No ratings yet
Matrices and Linear Transformations
74 pages
Orthogonality
No ratings yet
Orthogonality
61 pages
Subspace and Basis
No ratings yet
Subspace and Basis
60 pages
Multi-Class Classification
No ratings yet
Multi-Class Classification
52 pages
Pre-Training & LLM 2
No ratings yet
Pre-Training & LLM 2
46 pages
E5. Efficient LM Methods
No ratings yet
E5. Efficient LM Methods
41 pages
Research Paper
No ratings yet
Research Paper
5 pages
GSTN Informatin Booklet
No ratings yet
GSTN Informatin Booklet
100 pages
Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
No ratings yet
Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
7 pages
LLM Scaling Laws & Emergent Capacities
No ratings yet
LLM Scaling Laws & Emergent Capacities
23 pages
Deep Learning Recap
No ratings yet
Deep Learning Recap
13 pages
Introduction
No ratings yet
Introduction
6 pages
Tlc555-Q1 Lincmos™ Timer: 1 Features 3 Description
No ratings yet
Tlc555-Q1 Lincmos™ Timer: 1 Features 3 Description
26 pages
MAX1737 Stand-Alone Switch-Mode Lithium-Ion Battery-Charger Controller
No ratings yet
MAX1737 Stand-Alone Switch-Mode Lithium-Ion Battery-Charger Controller
42 pages
Maxillary Teeth Esthetic Proportions
No ratings yet
Maxillary Teeth Esthetic Proportions
5 pages
NIC Scientist Job Application
No ratings yet
NIC Scientist Job Application
5 pages
Preprocessor Directives in C Programming
No ratings yet
Preprocessor Directives in C Programming
7 pages
L21 L22 Varying CTReconstruction Parameters
No ratings yet
L21 L22 Varying CTReconstruction Parameters
24 pages
7152 - Application Manual
No ratings yet
7152 - Application Manual
103 pages
Spiceman FONT
No ratings yet
Spiceman FONT
10 pages
State of California Security Evaluation ES&S EVS 5210
No ratings yet
State of California Security Evaluation ES&S EVS 5210
12 pages
How To Find The Where Used List of Query Restrictions
No ratings yet
How To Find The Where Used List of Query Restrictions
14 pages
Native Otp Authentication With Netscaler
No ratings yet
Native Otp Authentication With Netscaler
14 pages