Crime (Paper 5)
Crime (Paper 5)
Available online
online at
at www.sciencedirect.com
www.sciencedirect.com
ScienceDirect
Available online at www.sciencedirect.com
Procedia
Procedia Computer
Computer Science
Science 00
00 (2018)
(2018) 000–000
000–000
www.elsevier.com/locate/procedia
ScienceDirect www.elsevier.com/locate/procedia
International
International Conference
Conference on
on Computational
Computational Intelligence
Intelligence and
and Data
Data Science
Science (ICCIDS
(ICCIDS 2018)
2018)
Abstract
Abstract
Crimes
Crimes are
are treacherous
treacherous andand common
common social
social problem
problem faced
faced worldwide.
worldwide. Crimes
Crimes affect
affect the
the quality
quality of
of life,
life, economic
economic growth,
growth, and
and
reputation of
reputation of aa nation.
nation. There
There hashas been
been an
an enormous
enormous increase
increase in in crime
crime rate
rate in
in the
the last
last few
few years.
years. In
In order
order to
to reduce
reduce the
the crime
crime rate,
rate,
the
the law
law enforcements
enforcements needneed toto take
take the
the preventive
preventive measures.
measures. WithWith thethe aim
aim of
of securing
securing the
the society
society from
from crimes,
crimes, there
there is
is aa need
need for
for
advanced systems and new approaches for improving the crime analytics for protecting their communities.
advanced systems and new approaches for improving the crime analytics for protecting their communities. Accurate real-time Accurate real-time
crime
crime predictions
predictions help
help toto reduce
reduce the
the crime
crime rate
rate but
but remains
remains challenging
challenging problem
problem for for the
the scientific
scientific community
community as as crime
crime
occurrences
occurrences depend
depend on on many
many complex
complex factors.
factors. In
In this
this work,
work, various
various visualizing
visualizing techniques
techniques andand machine
machine learning
learning algorithms
algorithms areare
adopted
adopted for
for predicting
predicting the
the crime
crime distribution
distribution over
over anan area.
area. In
In the
the first
first step,
step, the
the raw
raw datasets
datasets were
were processed
processed andand visualized
visualized based
based
on
on the
the need.
need. Afterwards,
Afterwards, machine
machine learning
learning algorithms
algorithms were
were used
used toto extract
extract the
the knowledge
knowledge out out of
of these
these large
large datasets
datasets and
and discover
discover
the
the hidden
hidden relationships
relationships among
among the the data
data which
which isis further
further used
used to
to report
report and
and discover
discover the
the crime
crime patterns
patterns that
that is
is valuable
valuable for
for crime
crime
analysts
analysts to
to analyse
analyse these
these crime
crime networks
networks byby the
the means
means of of various
various interactive
interactive visualizations
visualizations forfor crime
crime prediction
prediction and
and hence
hence is
is
supportive
supportive inin prevention
prevention ofof crimes.
crimes.
© 2018
© 2018 The
The Authors.
Authors. Published
Published by
by Elsevier
Elsevier B.V.
Ltd.
© 2018
This The
is an Authors.
open accessPublished by Elsevier
article under B.V.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under
Peer-review under responsibility of
of the scientific
scientific committee of
of the International
International Conference on
on Computational
Computational Intelligence
Intelligence and
and
Peer-review under responsibility
responsibility ofthethe scientificcommittee
committee ofthe
the InternationalConference
Conference on Computational Intelligence and
Data
Data Science
Data Science (ICCIDS
Science (ICCIDS 2018).
(ICCIDS 2018).
2018).
*
* Corresponding
Corresponding author.
author. Tel.:
Tel.: +91-7726054801.
+91-7726054801.
E-mail address: [email protected]
E-mail address: [email protected]
1877-0509
1877-0509 ©
© 2018
2018 The
The Authors.
Authors. Published
Published by
by Elsevier
Elsevier B.V.
B.V.
Peer-review
Peer-review under
under responsibility
responsibility of
of the
the scientific
scientific committee
committee of
of the
the International
International Conference
Conference on
on Computational
Computational Intelligence
Intelligence and
and
Data
Data Science
Science (ICCIDS
(ICCIDS 2018).
2018).
1. Introduction
Crimes are common social problems that affect the quality of life, economic growth and reputation of a country.
Crimes are one of the major factors that affect various important decisions of an individual’s life like moving to a
new place, roaming at right time, avoiding risky areas, etc. Crimes affect and defame the image of a community.
Crimes also affect the economy of a nation by placing the financial burden on government due to the need for
additional police forces, courts etc. As crimes are increasing drastically, we are at the alarming stage to reduce them
at even faster rate. The latest figures show a 13% increase in all police-recorded offences across England and Wales,
and even greater rises for violent offences including knife crime, sexual offences, and violence against the person
[9]. The crime figures show an underlying 8% rise in the murder rate, an increase of 46 victims, with 629 homicides
recorded in the 12 months to June, excluding the 35-people killed in the London and Manchester terrorist attacks
[9]. These figures can be reduced if we are able to analyze and predict the crime occurrence, the locations and take
preventive measures in advance. The crime rates can be significantly reduced by the real-time crime forecasting and
mass surveillance, which are helpful in saving lives that is the most valuable thing. Proper analysis of previous
crime data helps in predicting the crimes and thus supports in reducing the crime rate. The analysis process includes
the study of crime reports and identifying the emerging patterns, series, and trends as quickly as possible. This
analysis helps in preparing statistics, queries, and maps on demand. It also helps to see if a crime fits in a certain
known pattern or a new pattern is necessary.
Crimes can be predicted as the criminals are active and operate in their comfort zones. Once successful they try to
replicate the crime under similar circumstances [16]. The occurrence of crime depends on several factors such as
intelligence of a criminal, security of a location, etc. Criminals generally find similar location and time for
attempting next crime. Although it may not be true for all the cases, but the possibility of repetitions is high, as per
studies, and this makes the crimes predictable.
This paper proposes a web mapping & visualization-based crime prediction tool which is built in R [1] using its
various libraries such as RgoogleMaps [3], googleVis[5], etc. The proposed framework uses different visualization
techniques to show the trend of crimes and various ways that can predict the crimes using machine learning
algorithms. The work has followed the steps that used in Data Analysis [15], in which the important phases are the
Data collection, Data pre-processing, Data Visualization and Model building which are discussed more in detail in
the following sections. In brief, in data collection phase the data is obtained from the official site of U.K. police
department [7]. The data pre-processing phase consists of cleaning and transformation of data. The visualization
phase generates various reports and maps for diagnosis and analysis process and finally, in model building phase
various machine learning algorithms are used for classification of crime that can happen in a particular location.
2. Related work:
Analysis and prediction of crime is an important activity that can be optimized using various techniques and
processes. Lot of research work is done by various researchers in this domain. The existing work is limited to use
the datasets to identify locations of crime. But none of them considered that the type of crime, date of crime as the
factor. Yu, R et. al provides the static maps with no interactive features [8]. To overcome these limitations, the
proposed framework provides the visualization techniques that consider the type of crime to identify the crime
hotspots (shown in the fig.3 and fig.4) and helps to check these locations with the interaction features using Google
maps (shown in fig .2).
Few papers focused on usage of decision trees for crime prediction [4] [13][14]. Ahishakiye et. al and Iqbal et. al,
used the attributes population of country, Median Household income, percentage of people who are unemployed
with age greater than 16, type of crime, etc. which only predicts whether in an area there will be high, medium or
low percentage of violent crimes that can happen in future. The methods proposed by them didn’t predict the type of
crime that can happen [4] [14]. Nasridinov et. al also proposed a method for classifying the crime rate as high,
medium or low. None of them has classified the type of crime that can happen and its probability of happening.
698 Hitesh Kumar Reddy ToppiReddy et al. / Procedia Computer Science 132 (2018) 696–705
Hitesh Kumar Reddy Toppireddy / Procedia Computer Science 00 (2018) 000–000 3
Further, all used the decision trees which provide the information of which parameters of the dataset are important
for the study. Further, it predicts the crime of a location if the information of location is available in the dataset. For
example, if a crime happened at a location with latitude=-2.44 and longitude=50.35, the previous related provides
the information of future happening of crime at this location only but cannot predict at a location with latitude=-2.3
and longitude=50.4. By using the nearest neighbour approach (k-nn), this paper is able to overcome the mentioned
problem.
3. METHODOLOGY:
For optimum analysis and prediction of crime incidents, a Crime Prediction & Monitoring Framework Based on
Spatial Analysis is introduced. In this framework, various visualization techniques are used to analyze the data in a
better way. This framework is implemented in a GUI based tool using R programming and its various libraries. The
methodology and various phases are described as follows.
The dataset used for the work is reliable, real and authentic as data is acquired from the official site of the U.K.
Police department [7]. The data set contains a total of 11 attributes out of which 5 attributes were considered for the
study, they are crime type, location, date, latitude, and longitude. In this phase, the history of crimes from the year
2015-17 was considered as the training dataset. In the pre-processing phase, removal of the inconsistent data (such
as missing values, redundant information, etc.) and transformation of the data is done that is required for the
predicting the crime in the following modules.
Data visualization is an art and science. It is a form of visual communication. It involves creation and study of the
visual representation of data. The primary goal of data visualization is to communicate data clearly and effectively
via statistical graphics and plots. The effective visualization helps us to analyze and reason about data and evidence.
The work provides the generation of crime density maps which helps the crime analysts to analyze the crime
patterns. Understanding patterns of criminal activities are important for law enforcement and intelligence agencies
to investigate and prevent crimes.
As crimes occur in an area, analyzing them through location and maps helps a lot of understanding. This paper
provides a novel tool for visualizing the previous crime data on maps and predict the future crimes that can happen.
The interactive and visual features can be helpful in discovering and analyzing the crime Networks. Crime map plots
can help the investigators to explore relationships between criminals in the social network. As compared to textual
data, visualization of information provides a better understanding, we have developed a tool to explore the dataset
that provides various visualization modules. The various modules of the tool are developed in R [1] by using various
R libraries mainly RgoogleMaps [3], googleVis [5], ggplot2[6] and ggmap [6]. In following sections, the various
modules are described.
This module extracts the recent crime data from the dataset and based on longitude-latitude it tags the specific
location of the city. This tagging also displays the crime location name, the type of crime that happened. This
information is useful for an individual in knowing dangerous and risky areas and it thus can help them to avoid such
areas. The picture can help the law enforcement to improve the security in the areas. Fig.1 shows that locations,
where crimes occurred, are very near to each other. From this, we can analyze that if a location feasible to a criminal
attack, then the nearby locations are also feasible for the crime to occur. This module also provides the facility to
enquire about a specific location to show what type of crime is feasible to happen in that location.
Hitesh Kumar Reddy ToppiReddy et al. / Procedia Computer Science 132 (2018) 696–705 699
4 Hitesh Kumar Reddy Toppireddy / Procedia Computer Science 00 (2018) 000–000
This module visualizes the area where the crime has happened exactly. This helps the law enforcement to analyze
the security measures of an area. The module provides the interactive image which takes help of Google Maps to
navigate around the crime location and it can help the analyst to analyze the security of an area, also what locations
can be the target for next attack. This also helps the police for the clear understanding of the cause of crime and
helps them to investigate the location by not visiting the location again and again. By just clicking the tag in Fig.1, it
provides the realistic 3D interactive image of the location and helps in navigation around the location as shown in
Fig.2.
The type of crime is also an important factor as safety measures are majorly taken based on the type of crime. This
module helps visualize the crimes that had happened based on category over different areas as shown in Fig.3. This
helps the law enforcement to analyze what type of crimes are frequently happening in an area and helps them to
improve security measures based on the type of crimes.
700 Hitesh Kumar Reddy ToppiReddy et al. / Procedia Computer Science 132 (2018) 696–705
Hitesh Kumar Reddy Toppireddy / Procedia Computer Science 00 (2018) 000–000 5
The number of crimes happened in an area makes sense of how dangerous the area is. This module helps to visualize
the crime hotspots as shown in the Fig.4. The areas on the map that have high crime density are called the crime
hotspots [10]. Developing maps that contain hotspots are becoming a critical and influential tool for policing. These
are used by the researchers and analysts to examine the occurrence of hotspots in certain areas and why they happen
and help them to build the theories. This also allows researchers to explain why crime occurs in certain places and
why crime does not in other places. Crime analysts can use these to make better decisions, target resources,
formulate strategies and help the law agencies.
This module helps to generate the crime report based on the number of crimes that happened in every month and on
different categories of crimes as shown in Fig.5. This can help the public to take safety measures and helps the crime
analysts to check which type of crimes are increased or decreased.
Hitesh Kumar Reddy ToppiReddy et al. / Procedia Computer Science 132 (2018) 696–705 701
6 Hitesh Kumar Reddy Toppireddy / Procedia Computer Science 00 (2018) 000–000
3.2.6 Module 6: Interactive Crime Frequency Report Using Graph and Bar chart:
This module helps to generate a video representation of the trend of the type of crimes in every month that is
extracted from the datasets. This helps to visualize that which type of crime has increased or decreased compared to
previous months. Fig.6 represents the bar-chart, when the video is played it shows how the frequency of each crime
is changing in every month (the bars move up and down depending on the crimes have increased or decreased) and
Fig.7 represents the graphical representation in the bar chart. This module helps the analysts to understand the trend
of every crime that has happened in an area. Fig.6 shows that there are 3523 Anti-social behaviour crimes and
around 1000 burglary crimes are reported on 13-April-2017. Fig.7. represents the graphical representation of the bar
chart. This module helps the analysts to understand the trend of every crime that has happened in an area. It shows
that there are 3539 Anti-social behaviour crimes (the blue line) are reported on 1-January-2017. It also displays that
Anti-social behaviour is the most frequent crime in every month. The Burglary cases (the red line) have increased
majorly in every month, and remaining crimes are tending to be constant.
Data mining involves exploring the datasets and extracting the fruitful information to transform into an
understandable form for the further use. The data mining techniques were applied to the crime data for the crime
prediction based on theories in Criminology. The criminology mainly focuses on the Rational Choice Theory [12]
and Routine Activity Theory [11]. The Rational choice theory focuses on the understanding of crimes from
offender’s perspective which is directly concerned with thinking process of the offender and how they evaluate their
opportunities.
The Routine Activity Theory states that for a crime to occur, a likely offender finds a suitable target with capable
guardians absent and states that crimes are unaffected by social causes such as poverty, inequality, and
unemployment. The criminals repeat their activities by choosing the targets which are under similar conditions.
Based on this information the work provides the use of the following algorithms.
K-NN is a method used for classification. In K-NN classification, the output is a class membership. An object is
classified by a majority vote of its neighbour, with the object being assumed to the class most common among its k-
nearest neighbours. This algorithm can be applied to the crime dataset. Suppose a theft has happened in a house,
then the house next to it is also vulnerable for the theft as the criminal estimates the security is less and can try for
the theft at same locations again. Hence, the areas nearby the previous crime location are more probable for crime
occurrence. Therefore, the location is one factor to be considered. The date can also be considered as a factor. The
distance factor for classification, hence the distances between the testing areas and training areas are computed. For
this, the latitude and longitude as the coordinates and compute the distance factor as
Hitesh Kumar Reddy ToppiReddy et al. / Procedia Computer Science 132 (2018) 696–705 703
8 Hitesh Kumar Reddy Toppireddy / Procedia Computer Science 00 (2018) 000–000
If the date is also considered as a factor, the no of days ( ) needs to be computed and then calculate the distance
factor as
The problem with the K-NN is the computation. Every time it computes the Euclidean distance which involves
squaring and square root. Of course, computing distance with every training set can be parallelized using OpenMP
parallel processing techniques. To avoid the squaring and square root, the Manhattan distance was computed i.e.
This can also be computed parallelly. After computing the distances, the nearest ones were identified by using
effective sorting techniques and are assigned the type of crime attribute that has maximum voting in the k-
neighbours.
Fig.8. shows the data that is to be tested i.e. finding what crime can happen at a given location.
Figure.9. shows the output of k-NN showing the crime that can happen in an area and the probability of happening.
It is based on Bayes theorem which describes the probability of an event based on the prior knowledge of conditions
that might be related to the event. Mathematically it can be stated as
The Naïve Bayes classifier classifies a new instance X by assigning the most probable target value i.e. the maximum
likelihood. i.e.
Y=
(since Naïve Bayes assumes the independency of the attributes.)
With the data available in the datasets Naïve Bayes classifier can be applied to the Latitude, Longitude (or location),
Date attributes to classify the crime type that can occur.
Hitesh Kumar Reddy Toppireddy / Procedia Computer Science 00 (2018) 000–000 9
704 Hitesh Kumar Reddy ToppiReddy et al. / Procedia Computer Science 132 (2018) 696–705
Figures.10 and 11 show the computed probabilities and their graphical representation. It shows that there is 37.5%
chance of reporting the Anti-social behaviour case and 10.7% chance of Burglary.
4. Conclusion:
The tool we have developed provides a framework for visualizing the crime networks and analyzing them by
various machine learning algorithms using the Google Maps and various R packages. The project helps the crime
analysts to analyze these crime networks by means of various interactive visualizations. The interactive and visual
feature applications will be helpful in reporting and discovering the crime patterns. Many classification models can
be considered and compared in the Analysis. It is evident that law enforcing agencies can take a great advantage of
using machine learning algorithms to fight against the crimes and saving humanity. For better results, we need to
update data as early as possible by using current trends such as web and Apps.
5. Future Scope:
This paper presents the visualization techniques and classification algorithms that can be used for predicting the
crimes and helps the law agencies. In future, there is a plan for applying other classification algorithms on the crime
data and improving the accuracy in prediction. On other direction, we will be trying to build an Android App for the
live capture of the realistic data and updating the results by using this new data frequently, that will be helpful in
better prediction and providing the general information to the public for the awareness of trends in the crime.
10 Hitesh Kumar Reddy Toppireddy / Procedia Computer Science 00 (2018) 000–000
Hitesh Kumar Reddy ToppiReddy et al. / Procedia Computer Science 132 (2018) 696–705 705
References:
[1] Ihaka, R. (1998). R: Past and future history. Computing Science and Statistics, 392396.
[2] Wang, B., Zhang, D., Zhang, D., Brantingham, P. J., & Bertozzi, A. L. (2017). Deep Learning for Real Time Crime Forecasting. arXiv
preprint arXiv:1707.03340.
[3] Loecher, M. (2014). RgoogleMaps: overlays on Google map tiles in R. See http://cran. r-project.org/web/packages/RgoogleMaps/index.
html.
[4] Ahishakiye, E., Taremwa, D., Omulo, E. O., Nairobi-Kenya, G. P. O., & Niyonzima, I. (2017). Crime Prediction Using Decision Tree
(J48) Classification Algorithm. analysis, 6(03).
[5] Gesmann, M., & de Castillo, D. (2011). Using the Google visualisation API with R. The R Journal, 3(2), 40-44.
[6] Kahle, D., & Wickham, H. (2013). ggmap: Spatial Visualization with ggplot2. R Journal, 5(1).
[7] U.K. Crime data, https://data.police.uk/data/
[8] Yu, R., Song, M., & Cui, E. San Francisco Crime Analysis and Classification.
[9] https://www.theguardian.com/uk-news/2017/oct/19/rising-at-increasing-rate-in-england-and wales-police-figures-show.
[10] Crime Petrol https://en.wikipedia.org/wiki/Crime_hotspots
[11] Routine Activity Theory https://en.wikipedia.org/wiki/Routine_activity_theory
[12] Routine Choice Theory https://en.wikipedia.org/wiki/Rational_choice_theory_(criminology)
[13] Nasridinov, A., Ihm, S. Y., & Park, Y. H. (2013). A decision tree-based classification model for crime prediction. In Information
Technology Convergence (pp. 531-538). Springer, Dordrecht.
[14] Iqbal, R., Murad, M. A. A., Mustapha, A., Panahy, P. H. S., & Khanahmadliravi, N. (2013). An experimental study of classification
algorithms for crime prediction. Indian Journal of Science and Technology, 6(3), 4219-4225.
[15] https://en.wikipedia.org/wiki/Data_analysis
[16] https://www.slideshare.net/socialmediadna/predictive-policing-the-role-of-crime-forecasting-in-law-enforcement-operations