Project Synopsis
on
“Crime Prediction and Prevention”
Submitted in partial fulfillment of the requirement for the degree of
Bachelors of Engineering by
Sejal Deshmukh
Swati Thakur
Sanskruti Pawar
Atharva Gavankar
Under the guidance of
Prof. Shraddha Kunkunkar
LOKMANYA TILAK COLLEGE OF ENGINEERING
An Autonomous Institute Affiliated to
UNIVERSITY OF MUMBAI
Department of Computer Science Engineering (Data Science)
Academic Year – 2024-2025
CERTIFICATE
This is to certify that the mini project entitled “CRIME PREDIC-
TION AND PREVENTION” is a bonafide work of Sanskruti Pawar
(DS138), Swati Thakur (DS155), Sejal Deshmukh (DS168) and
Atharv Gavankar (DS129) submitted to Lokmanya Tilak College of
Engineering, An Autonomous Institute affiliated to the University Of
Mumbai in partial fulfillment of the requirement for the degree of “Bache-
lor of Engineering” in “Computer Science and Engineering (Data
Science)”.
Prof. Shraddha Kunkunkar Dr. Nandini C Nag Dr. Subhash K Shinde
(Project Guide) (Head of Department) (Principal)
External Examiner
Place: Lokmanya Tilak College of Engineering
Date:
MINI PROJECT APPROVAL
This Mini Project entitled “ CRIME PREDICTION AND PREVENTION” by
Sanskruti Pawar (Ds138), Swati Thakur (DS155), Sejal Deshmukh (DS168) and
Atharv Gavankar (DS129) is approved for the degree of Bachelor of Engineering
in Computer Science and Engineering (DataScience).
Examiner
1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..............
(Internal Examiner Name Sign)
2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..............
(External Examiner Name Sign)
Date:
Place:
ii
ACKNOWLEDGEMENT
We would like to acknowledge and extend our heartfelt gratitude to all those people
who have been associated with this project and have helped us with it thus making
worthwhile experience.
Firstly we extend our thanks to various people which include our project guide
Prof. Shraddha Kunkunkar who has shared her opinions and experiences
through which we received the required information crucial for our project synop-
sis. We are also thankful to head of the department Dr. Nandini C Nag and all
the staff members of Computer Science and Engineering(Data Science) for their
highly co-operative and encouraging attitudes, which have always boosted us.
We also take this opportunity with great pleasure to thank our esteemed Principal
Dr. Subhash K. Shinde whose timely support and encouragement has helped
us succeed in our venture.
Name of Candidate Signature
1.Sanskruti Pawar
2.Swati Thakur
3.Sejal Deshmukh
4.Atharva Gavankar
iii
ABSTRACT
This report is on crime prediction and prevention which examines the important
role of of our project through the use of machine learning techniques, addressing
the growing concern of criminal activity in our society. As crime continues to
evolve with advancements in technology and globalization, it has become increas-
ingly challenging to manage and predict. Traditional methods of crime analysis
often struggle to keep up with the complexities of modern crime patterns. In this
context, machine learning offers a promising solution by analyzing large datasets,
including historical crime records, demographic information, and environmental
factors, to identify patterns that may indicate future criminal activities. By devel-
oping predictive models, we can assess the likelihood of specific crimes occurring
in certain areas or time frames, which helps law enforcement agencies allocate re-
sources more effectively and implement targeted prevention strategies. Addition-
ally, these insights can deepen our understanding of criminal behavior, allowing
for a more proactive approach to community safety. Ultimately, this report aims
to show how machine learning can transform crime prevention efforts, empowering
law enforcement and communities to work together to create safer environments
and improve the quality of life for everyone. Through this exploration, we hope
to contribute to the ongoing conversation about the potential of technology to
enhance public safety initiatives.
iv
Contents
Acknowledgement iii
Abstract iv
List of Figures vii
1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organisation of the Report . . . . . . . . . . . . . . . . . . . . . . . 3
2 LITERATURE SURVEY 4
2.1 Survey of Existing System . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Limitation in Existing System . . . . . . . . . . . . . . . . . . . . . 6
3 PROPOSED METHODOLOGY 7
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 IMPLEMENTATION 11
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Software and Hardware Used . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 RESULTS AND DISCUSSION 14
5.1 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Model Evaluation & Performance Comparison . . . . . . . . . . . . 14
6 CONCLUSIONS 17
6.1 Project Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
v
6.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
References 19
vi
List of Figures
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Accuracy Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
vii
Chapter 1
INTRODUCTION
Crime is an act that is prohibited by law and is punishable by a fine, imprisonment,
or other legal action. Every day, reports of criminal activity fill our news outlets
and social media platforms, painting a picture of a world in which crime is an
ever-present concern. From robberies and violent assaults to cybercrimes and
white-collar fraud, there is seemingly no end to the number of ways in which
criminals can cause harm. Crime has been a part of human civilization since
time immemorial. It has become increasingly prominent in today’s world. The
rise of technology has created a variety of new crimes, while the emergence of
globalization has made the world a smaller place, allowing criminals to move and
operate in different countries.
Crime is uncertain and cannot be predicted. Crime prediction is significant to
determine increase or decrease in crime rate from preceding years. A huge num-
ber of crimes happen every second in different places, in different patterns and in
different times and the number is increasing each growing day. A good prediction
technique provides a more rapid evolution of criminal data sets. It helps in pre-
dicting the correct place of crime and criminal activity, as well as aids in keeping
track of resources pertaining to the analysis of crime.
Crime prediction using machine learning is an emerging field of study that uses
sophisticated algorithms and data-driven methods to detect and predict criminal
1
activities. Machine learning algorithms can be used to identify patterns in data
that may indicate a future crime, such as past criminal activities, demographic in-
formation, and environmental factors. By leveraging such data, machine learning
can be used to create predictive models that identify the likelihood of a certain
crime occurring in a particular area or time frame. Additionally, machine learning
can be used to develop insights into the behavior of criminals, helping law enforce-
ment professionals better understand and address criminal activity.
1.1 Motivation
The motivation for this report comes from a deep concern about the rising rates of
crime and the impact it has on our communities. Every day, we hear stories about
crime that affect people’s lives, safety, and sense of security. As our world becomes
more interconnected and technology advances, criminals are finding new ways to
exploit vulnerabilities, making it even more challenging for law enforcement to keep
up. We believe that there is a better way to tackle this issue. Traditional methods
of analyzing crime often leave us reacting to problems rather than preventing
them. This is where machine learning comes in. By using advanced technology to
analyze large amounts of data—like past crime reports, demographic information,
and environmental factors—we can uncover patterns that might help us predict
where and when crimes are likely to occur.
1.2 Objectives
The main aim of this project is to develop a system that can accurately predict
crime rates and identify potential future crime trends. This information can then
be used by officials to devise strategies to reduce crime rates and create a safer
environment. To predict the crime rate (dependent variable) based on the year,
location, and type of crime (independent variables), various types of machine learn-
ing algorithms will be applied. The system will examine how to convert the crime
information into a regression problem, thus helping the officials to solve crimes
faster. Crime analysis using available information to extract patterns of crime.
2
Based on the territorial distribution of existing data and the recognition of crimes,
various multi-linear regression techniques can be used to predict the frequency of
crimes.
1.3 Organisation of the Report
The report on Crime Prediction and Prevention is organized into six chapters.
Following the list of abbreviations and figures, Chapter 1 serves as the introduc-
tion, providing an overview of the topic and stating the purpose of the report.
This chapter also outlines the motivation for selecting this particular subject, de-
fines the problem at hand, and presents the objectives of the study.Chapter 2 is
dedicated to the literature survey, summarizing past research related to crime pre-
diction and prevention. It offers insights into previous studies, highlighting the
methodologies used and the gaps identified in existing work. This section sets
the stage for our contributions to the field.In Chapter 3, we present the proposed
system, detailing the key elements of our project. This includes the architecture
of the system, a flowchart illustrating how the project operates, and a descrip-
tion of the design process. Additionally, this chapter outlines the hardware and
software utilized in the implementation of the project.Chapter 4 focuses on the
experimental work conducted to ensure the success of the project. It describes
the various experiments carried out, the methodologies employed, and the results
obtained from these experiments.Chapter 5 is dedicated to the Results and Dis-
cussion, where we analyze the findings and their implications for crime prediction
and prevention. This chapter aims to distill the complexities of the topic into
clear, actionable insights.Finally, Chapter 6 highlights the key takeaways from our
project and discusses the future scope of crime prediction and prevention efforts.
This concluding chapter emphasizes the potential for further research and devel-
opment in this critical area, paving the way for enhanced public safety initiatives
3
Chapter 2
LITERATURE SURVEY
2.1 Survey of Existing System
Prediction of Crime Rate in Banjarmasin City Using RNN-GRU Model proposed
by Muhammad Alkaff describes a model to predict the crime rate by using the
Recurrent Neural Network (RNN) with the Gated Recurrent Unit (GRU) archi-
tecture. The model takes into consideration the inflation rate and discretionary
income. GRU is a modified RNN algorithm that is simpler than the Long-Short
Term Memory (LSTM) Neural Network and is more effective in adapting to differ-
ent timescales and dealing with Vanishing Gradient problems. It consists of two
gates, the Update gate (zt) and the Reset gate (rt), and is compatible with data
that is not as much as LSTM, achieving optimal results even with fewer data.
After collecting and normalizing the data, the model produced the best results
with the lowest MAE and RMSE values of 1.7368 and 2.21, respectively, and an
R-Squared value of 0.84, indicating good model performance.[1]
Empirical Analysis for Crime Prediction and Forecasting Using Machine Learning
and Deep Learning Techniques proposed by Wajiha Safat aims to analyze crime
prediction in the Chicago and Los Angeles datasets by improving the predictive
accuracy with the Logistic Regression, SVM, Naı̈ve Bayes, KNN, Decision Tree,
MLP, Random Forest, and XGBoost algorithms, time-series analysis with LSTM,
exploratory data analysis for visual summary, and crime forecasting for the crime
4
rate and high-intensity crime areas for subsequent years with an ARIMA model.
This paper investigated the predictive accuracy of eight different algorithms for
the Chicago and Los Angeles datasets, with XGBoost performing best with an
accuracy of 94 and 88 percentage, respectively. To measure scale-dependent error,
an LSTM model was implemented, and RMSE and MAE metrics were used. In
addition, an ARIMA model was used to forecast future crime density areas, in-
dicating that Chicago will continue to increase moderately, followed by a stable
decline, while Los Angeles will decline sharply.[2]
Sakib Mahmud and Musfika Nuha proposed the relationship between crime and
different features in the criminology literature. To reduce crimes and detect crim-
inal activity, the author used Z-Crime Tools and Advanced ID3 algorithms with
data mining technology, K-Means Clustering and deep learning algorithms, ran-
dom forest and naı̈ve Bayes algorithms, and multi-linear regression. Additionally,
the author used Apriori and Naive Bayes algorithms to identify and predict crim-
inal trends and patterns. For classification, algorithms such as Naive Bayes were
used to classify objects into predefined groups and classes. The accuracy of differ-
ent algorithms is evaluated, with K-nearest neighbour providing the most precise
crime rate forecast system. Linear, Naive Bayes and KNN algorithms had accu-
racy scores of 73.6 percent, 69.5 percent and 76.9 percent respectively.[3]
5
2.2 Limitation in Existing System
Existing systems for crime prediction and prevention face several significant lim-
itations that hinder their effectiveness. One major challenge is the quality and
availability of data; many models rely on historical crime data that can be incom-
plete, biased, or outdated, leading to flawed predictions and ineffective resource al-
location. Additionally, these systems may inadvertently perpetuate biases present
in historical data, resulting in unfair targeting of certain communities and erod-
ing trust between law enforcement and the public. The dynamic nature of crime
further complicates matters, as crime patterns can change rapidly due to social,
economic, and environmental factors, making it difficult for existing models to
adapt quickly enough to reflect current realities. Moreover, the complexity of
criminal behavior, influenced by a multitude of variables, is often oversimplified in
predictive models, which may fail to capture the nuanced interactions that con-
tribute to crime.
6
Chapter 3
PROPOSED METHODOLOGY
3.1 Introduction
Crime is defined as an act that is forbidden by law and can result in penalties such
as fines, imprisonment, or other legal consequences. Each day, we are bombarded
with reports of criminal activities through news outlets and social media, creating
a sense of unease about the prevalence of crime in our society. From thefts and
violent attacks to cybercrimes and corporate fraud, the ways in which criminals
can inflict harm seem endless. Crime has been a part of human history for as
long as we can remember, but it has become increasingly visible in our modern
world. The advancement of technology has introduced new forms of crime, while
globalization has made it easier for criminals to operate across borders.
The unpredictable nature of crime makes it challenging to anticipate. However,
crime prediction plays a crucial role in understanding trends in crime rates over
time. With countless crimes occurring every second in various locations and pat-
terns, the numbers continue to rise. Effective prediction techniques can enhance
the analysis of criminal data, allowing for quicker responses and better resource
management in addressing crime.
The field of crime prediction using machine learning is gaining traction as it em-
ploys advanced algorithms and data-driven approaches to identify and forecast
7
criminal activities. By analyzing data such as historical crime reports, demo-
graphic details, and environmental factors, machine learning algorithms can un-
cover patterns that may signal future criminal behavior. This technology enables
the creation of predictive models that assess the likelihood of specific crimes hap-
pening in certain areas or during particular time frames. Furthermore, machine
learning can provide valuable insights into criminal behavior, equipping law en-
forcement with a deeper understanding of crime dynamics and helping them to
tackle criminal activity more effectively.
3.2 Methodology
A. Architecture
Figure 3.1: System Architecture
8
B. Framework
Crime prediction plays a crucial role in identifying potential criminal activities and
allocating resources effectively to enhance public safety. By leveraging machine
learning algorithms and analyzing various data points such as historical crime
data, socio-economic factors, environmental conditions, and community reports,
law enforcement agencies can detect patterns that indicate the likelihood of future
crimes.
Data Collection - The data collection process for crime prediction involves gath-
ering diverse datasets, including historical crime records, demographic information,
socio-economic factors, and data from IoT surveillance systems. This data can be
sourced from police reports, community surveys, social media insights, and en-
vironmental sensors. By systematically collecting these parameters, agencies can
create a comprehensive dataset that serves as the foundation for predictive mod-
eling.
Data Preprocessing - Data preprocessing involves cleansing the dataset by han-
dling missing values, outliers, and inconsistencies. This process includes removing
duplicates and correcting inaccuracies in the collected data, ensuring that the his-
torical crime records and socio-economic factors are reliable for analysis.
Exploratory Data Analysis (EDA) - EDA is conducted to understand the
distribution of variables, correlations, and patterns within the data. This step
helps visualize crime trends, identify hotspots, and uncover temporal patterns
that may be significant for prediction.
Feature Engineering - Feature engineering techniques are employed to select the
most relevant features (variables) from the dataset that are likely to be predictive
of crime. This includes demographic details, socio-economic indicators, environ-
mental factors, and historical crime patterns.
Model Selection - Various machine learning algorithms, including decision trees,
random forests, and support vector machines (SVM), are evaluated for crime pre-
diction. This selection process helps identify the most effective algorithm for the
specific dataset and prediction task.
9
Model Training - The dataset is split into training and validation sets, with
models trained on the training data, which typically constitutes 70-80 percent of
the dataset. This allows the model to learn from a substantial amount of data
while reserving a portion for validation.
Model Evaluation - The model’s performance is assessed using metrics like ac-
curacy, precision, recall, and the confusion matrix. These metrics provide insights
into how well the model is performing and its ability to correctly predict criminal
activities.
Deployment - The chosen model is deployed into production systems for real-time
crime prediction. This step involves integrating the model into law enforcement
workflows to assist in resource allocation and proactive policing strategies.
Monitoring and Maintenance - Ongoing monitoring of model performance in
production is conducted. The model is retrained and updated periodically with
new data to ensure its accuracy and relevance. Continuous evaluation of the
model’s effectiveness is performed, with adjustments made as needed to maintain
optimal performance.
10
Chapter 4
IMPLEMENTATION
4.1 Dataset
The dataset of this project is to identify and analyze historical crime data to
predict the total number of IPC crimes in each state of India. The model utilizes
various input features related to different categories of crimes, allowing users to
input specific values for crime categories to generate predictions.
Prediction Process:
User Input: The user selects a state and inputs values for various crime categories,
including:
Murder Rape Kidnapping Abduction Robbery Burglary Theft Auto Theft Dowry
Deaths Assault on Women with Intent to Outrage Her Modesty Feature Calcu-
lation: The system automatically calculates combined features based on the user
inputs:
VIOLENT CRIMES = Murder + Rape + Kidnapping Abduction + Robbery
PROPERTY CRIMES = Burglary + Theft + Auto Theft CRIMES AGAINST
WOMEN = Rape + Dowry Deaths + Assault on Women with Intent to Out-
rage Her Modesty YEAR NORMALIZED: A normalized year value to account for
trends over time. Model Prediction: The model uses all the calculated features to
11
predict the total number of IPC crimes for the selected state.
Combined Features:
VIOLENT CRIMES: Represents the total number of violent offenses, which is
crucial for understanding the severity of crime in a state. PROPERTY CRIMES:
Captures the total number of property-related offenses, providing insights into
theft and burglary trends. CRIMES AGAINST WOMEN: Focuses on offenses
specifically targeting women, highlighting societal issues and safety concerns. Fea-
ture Importance Analysis: From the analysis of feature importance, the following
factors were identified as the most significant contributors to the model’s predic-
tions:
HURT/GREVIOUS HURT: 83.98 percent importance, indicating that this cate-
gory has a substantial impact on the total crime prediction. PROPERTY CRIMES:
11.14 percent importance, showing that property-related offenses are also signif-
icant but less impactful than grievous hurt. THEFT: 1.08 percent importance,
suggesting that while theft is a relevant factor, it has a minimal effect on the
overall prediction compared to the other categories. Model Output: The model
predicts the TOTAL IPC CRIMES based on the input features and calculated
combined features, providing valuable insights for law enforcement and policy-
makers to address crime effectively in each state.
12
4.2 Software and Hardware Used
4.2.1 Hardware
• Processor : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz 2.11 GHz
• System Type : 64-bit operating system, x64-based processor
• RAM : 8.00 GB (7.84 GB usable)
4.2.2 Software
• Operating system : Windows 11 Home Single Language
• Version : 22H2
• OS Build : 22621.3155
• Experience : Windows Feature Experience Pack 1000.22684.1000.0
• Tech Stack : Next.js, Python
• Libraries : Flask, flask cols, pickle, numpy
• IDE : VS Code
13
Chapter 5
RESULTS AND DISCUSSION
5.1 Results and Discussion
As shown in figure ”Crime Rate Prediction Dashboard” designed to analyze and
predict crime rates across Indian states. Key features include: 1.Header: Blue
gradient with a shield icon, title, and subtitle. 2.State Selection: Dropdown menu
to select a state and a ”Generate Prediction” button. 3.Predicted Crime Rate:
Displays the overall crime rate and specific rates for murder, rape, robbery, and
kidnapping in individual cards. 4.Map Visualization: A heatmap showing crime
rates across India. The design is clean and modern, using a blue and white color
scheme, clear typography, and a well-organized layout. The dashboard is interac-
tive and likely responsive, making it user-friendly and informative.
5.2 Model Evaluation & Performance Compari-
son
The Random Forest Regression model demonstrates the best accuracy in predict-
ing test data among the five selected models.
The model predicts the crime rate with an accuracy of 93.20% on the testing data.
The accuracy results obtained after testing are listed below:
14
Figure 5.1: Result
Error Metrics:
The Mean Absolute Error represents the average of the absolute difference between
the actual and predicted values in the dataset. It measures the average of the
residuals in the dataset.
N
1 X
M AE = |yi − ŷ|
N i=1
15
Figure 5.2: Accuracy Results
Mean Squared Error represents the average of the squared difference between the
original and predicted values in the dataset. It measures the variance of the resid-
uals.
N
1 X
M SE = (yi − ŷ)2
N i=1
The coefficient of determination or R-squared represents the proportion of the
variance in the dependent variable which is explained by the linear regression
model. It is a scale-free score i.e., irrespective of the values being small or large,
the value of R square will be less than one.
(yi − ŷ)2
P
2
R =1− P
(yi − ȳ)2
16
Chapter 6
CONCLUSIONS
In this project, we have developed a predictive model aimed at forecasting crime
rates based on historical data and user-inputted features. After selecting a state
and entering relevant crime category values, such as violent crimes, property
crimes, and crimes against women, our model processes this information to predict
the total number of IPC (Indian Penal Code) crimes in that area. As demon-
strated in our analysis, the model effectively identifies high-risk areas by calcu-
lating combined features like VIOLENT CRIMES, PROPERTY CRIMES, and
CRIMES AGAINST WOMEN. This predictive capability allows law enforcement
agencies to allocate resources more efficiently and implement targeted interven-
tions. By focusing on the most significant factors influencing crime rates, such as
grievous hurt and property crimes, our model provides valuable insights for proac-
tive policing strategies. Ultimately, this project underscores the importance of
data-driven approaches in crime prevention, enabling safer communities through
informed decision-making and strategic resource deployment.
6.1 Project Highlights
Advantages:
• Multimodal Data Integration: Combines various data sources for a comprehen-
sive analysis of crime patterns.
17
• Proactive Resource Allocation: Enables law enforcement to allocate resources
efficiently to enhance public safety.
Limitations:
• Dataset Diversity: May not represent all demographic groups, affecting general-
izability.
• Potential Biases: Existing datasets may introduce biases due to missing variables
or underrepresented populations.
Features:
1) Advanced Machine Learning Techniques: Utilizes state-of-the-art algorithms to
improve predictive accuracy and robustness.
2)Focus on Crime Prevention: Identifies high-risk areas and trends for timely in-
terventions.
6.2 Future Scope
The future scope of our crime prediction model includes establishing partnerships
with law enforcement agencies for practical implementation and real-time feed-
back. We aim to develop more complex machine learning algorithms to enhance
prediction efficiency and accuracy, along with regular calibration of models to
adapt to evolving crime patterns. Additionally, expanding our datasets to include
diverse demographic and geographic information will help mitigate biases and im-
prove generalizability. These initiatives will create a robust tool that significantly
contributes to crime prevention and public safety.
18
REFERENCES
[1] Alkaff, Muhammad & Mustamin, Nurul Fathanah & Firdaus, Gusti. (2023). Prediction of
Crime Rate in Banjarmasin City Using RNN-GRU Model. International Journal of Intelligent
Systems and Applications in Engineering. 10. 1-09.
[2] Safat, Wajiha, et al. “Empirical Analysis for Crime Prediction and Forecasting Using
Machine Learning and Deep Learning Techniques.” IEEE Access, vol. 9, 2021, pp. 70080–94.
DOI.org (Crossref), https://doi.org/10.1109/ACCESS.2021.3078117.
[3] Mahmud, Sakib, et al. “Crime Rate Prediction Using Machine Learning and Data Mining.”
Soft Computing Techniques and Applications, edited by Samarjeet Borah et al., vol. 1248,
Springer Singapore, 2021, pp. 59–69. DOI.org (Crossref), https://doi.org/10.1007/978-981-15-
7394-1_5.
Experiment 8
Program and Output :
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
URL_='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’;
df = pd.read_csv(URL_, header = None)
X = df.iloc[:, 0:4].values
y = df.iloc[:, 4].values
print(X[0:5])
print(y[0:5])
train_data, test_data, train_labels, test_labels = train_test_split(X, y, test_size=0.25)
train_labels = np.where(train_labels == 'Iris-setosa', 1, -1)
test_labels = np.where(test_labels == 'Iris-setosa', 1, -1)
print('Train data:', train_data[0:2])
print('Train labels:', train_labels[0:5])
print('Test data:', test_data[0:2])
print('Test labels:', test_labels[0:5])
from sklearn.linear_model import Perceptron
perceptron = Perceptron(random_state = 42, max_iter = 20, tol = 0.001)
perceptron.fit(train_data, train_labels)
test_preds = perceptron.predict(test_data)
print(test_preds)
test_accuracy = accuracy_score(test_preds, test_labels)
print("Accuracy on test data: ", round(test_accuracy, 2) * 100, "%")