Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A machine learning algorithm which uses Naive Bayes to create a model that classifies dataset SMS messages as spam or not spam with %95 accuracy

Notifications You must be signed in to change notification settings

imehrdadmahdavi/spam-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Detection

Description

This project uses the Naive Bayes Machine Learning algorithm to create a model that can classify dataset SMS messages as spam or not spam, based on the training we give to the model. It is important to have some level of intuition as to what a spammy text message might look like.

What are spammy messages?

Usually they have words like 'free', 'win', 'winner', 'cash', 'prize', or similar words in them, as these texts are designed to catch your eye and tempt you to open them. Also, spam messages tend to have words written in all capitals and also tend to use a lot of exclamation marks. To the recipient, it is usually pretty straightforward to identify a spam text and our objective here is to train a model to do that for us!

Being able to identify spam messages is a binary classification problem as messages are classified as either 'Spam' or 'Not Spam' and nothing else. We will be feeding a labelled dataset into the model, that it can learn from, to make future predictions.

This projecs uses the sklearn implementation of the Naive Bayes and also discuss approaches to build Naive Baye from scratch.

Install

This project requires Python 3.7 and the following Python libraries installed:

Code

The code is provided in the spam-detection.ipynb notebook file.

Run

In a terminal or command window, navigate to the top-level project directory spam-detection/ (that contains this README) and run one of the following commands:

ipython notebook spam-detection.ipynb

or

jupyter notebook spam-detection.ipynb

This will open the Jupyter Notebook software and project file in your browser.

Data

This project uses a dataset from the UCI Machine Learning repository which has a very good collection of datasets for experimental research purposes. The direct data link is here.

Here's a preview of the data:

SMS

The columns in the data set are currently not named and as you can see, there are 2 columns.

The first column takes two values, 'ham' which signifies that the message is not spam, and 'spam' which signifies that the message is spam.

The second column is the text content of the SMS message that is being classified.

About

A machine learning algorithm which uses Naive Bayes to create a model that classifies dataset SMS messages as spam or not spam with %95 accuracy

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published