Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
42 views3 pages

SMS Spam Classifier Guide

The document describes developing a machine learning model to classify SMS messages as spam or ham. It outlines preprocessing text data, extracting TF-IDF features, training a model like Naive Bayes or SVM, and evaluating the model's performance on test data using metrics like accuracy, precision and recall.

Uploaded by

ssm cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views3 pages

SMS Spam Classifier Guide

The document describes developing a machine learning model to classify SMS messages as spam or ham. It outlines preprocessing text data, extracting TF-IDF features, training a model like Naive Bayes or SVM, and evaluating the model's performance on test data using metrics like accuracy, precision and recall.

Uploaded by

ssm cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SMS spam classifier

1. Introduction
❖ Brief introduction to the project.
❖ Statement of the problem (identifying and classifying spam messages in SMS).

2. Objectives
❖ Clearly defined project objectives.
❖ What you aim to achieve with the SMS spam classifier.

3. Dataset Description
❖ Source of the dataset (e.g., Kaggle).
❖ Dataset size and characteristics.
❖ Description of columns/features (e.g., 'text' and 'label').

4. Data Preprocessing
❖ Data loading and exploration.
❖ Data cleaning (dealing with missing values or duplicates).
❖ Text preprocessing steps, including lowercasing, tokenization, punctuation removal, stopword
removal, and stemming/lemmatization.

5. Feature Extraction
❖ Explanation of the feature extraction method used (e.g., TF-IDF).
❖ How text data was converted into numerical features.

6. Model Development
❖ Choice of machine learning algorithm (e.g., Naive Bayes, SVM).
❖ Model training on the preprocessed data.
❖ Hyperparameter tuning (if applicable).
7. Evaluation
❖ Performance metrics used (e.g., accuracy, precision, recall, F1-score).
❖ Splitting the dataset into training and testing sets.
❖ Model evaluation on the test set.
❖ Confusion matrix and other relevant visualizations.

8. Results
❖ Summary of the model's performance.
❖ Insights from the evaluation.
❖ Any challenges faced during model development and evaluation.

9. Conclusion
❖ Recap of the project objectives and what was achieved.
❖ Discussion of the practical implications of the SMS spam classifier.
❖ Suggestions for future improvements or extensions of the project.

10.Reference:
Code:
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.stem import SnowballStemmer
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
data = pd.read_csv('D:\spam.csv', encoding='latin-1')

print(data.head())
print(data['Label'].value_counts())
def preprocess_text(text):
text = text.lower()
text = re.sub(r'[^a-z]', ' ', text)
words = text.split()

stemmer = SnowballStemmer("english")
words = [stemmer.stem(word) for word in words if word not in
set(stopwords.words('english'))]
return ' '.join(words)

data['processed_text'] = data['Text'].apply(preprocess_text)
X = data['processed_text']
y = data['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


tfidf_vectorizer = TfidfVectorizer()
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

Output:

You might also like