What is Text Classification?
Text classification is a Natural Language Processing (NLP) technique used to automatically
assign categories or tags to textual data. It's widely used in:
● Spam detection (spam or not spam)
● Sentiment analysis (positive, neutral, negative)
● Topic labeling (e.g., sports, politics, tech)
● Language detection
Text classification transforms text into numerical features, and then a machine learning model is
trained to learn patterns and make predictions.
What is Naive Bayes?
Naive Bayes is a probabilistic classifier based on Bayes' Theorem, and it assumes:
1. Features (words in text) are independent of each other (naive assumption).
2. Each word contributes equally and independently to the probability of a class.
Despite its simplicity, it works surprisingly well in many text classification problems.
Bayes' Theorem Refresher:
P(C/X)=[P(X/C).P(C)]/P(X)
Where:
● P(C∣X): Posterior probability of class C given features X (e.g., probability it's spam
given the words)
● P(X∣C): Likelihood of features X given class C
● P(C): Prior probability of class C
● P(X): Probability of features X
How Naive Bayes Works in Text Classification:
1. Training Phase:
○ Go through the training text documents and count how often each word appears in
each category.
○ Estimate probabilities for each word in each class.
○ Use smoothing techniques like Laplace Smoothing to handle rare or unseen
words.
2. Prediction Phase:
○ For a new document, calculate the probability of each class given the words in the
document.
○ The class with the highest probability is the predicted category.
Naive Bayes for Text Classification
Here's how it works:
1. Text Preprocessing:
○ Tokenize the text (split into words).
○ Remove stopwords, punctuation.
○ Lowercase everything.
○ Optional: Stemming/Lemmatization.
2. Feature Extraction:
○ Convert text into numerical features using:
■ Bag of Words (BoW): Counts word occurrences.
■ TF-IDF: Weights words based on frequency in a document vs. all
documents.
3. Model Training:
○ Count word frequencies per class (e.g., word “free” appears 40 times in spam
emails and 2 times in non-spam).
○ Calculate:
■ Prior probabilities (e.g., % of emails that are spam).
■ Likelihoods (probability of a word given a class).
○ Apply Laplace smoothing to handle words not seen in training data.
4. Prediction:
○ For a new email/document, calculate the probability it belongs to each class.
○ Choose the class with the highest probability.
Pros of Naive Bayes:
● Simple and fast to train and predict.
● Scales well to large datasets.
● Performs well even with a small amount of training data.
● Robust to noisy and irrelevant features (e.g., common words like "the", "is").
Cons:
● Assumes independence among features (not true in real language).
● If a word wasn’t seen during training, its probability becomes zero (handled by
smoothing).
Applications:
● Email spam detection
● News categorization
● Twitter sentiment analysis
● Document tagging (legal, medical, academic domains)
Variants of Naive Bayes in Text:
1. Multinomial Naive Bayes (most common for text) – Uses word frequencies.
2. Bernoulli Naive Bayes – Uses binary features (whether a word appears or not).
3. Gaussian Naive Bayes – Used for continuous data (not common in NLP).