Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
90 views4 pages

Detecting Citizen Problems and Their Locations Using Twitter Data

Uploaded by

murari kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views4 pages

Detecting Citizen Problems and Their Locations Using Twitter Data

Uploaded by

murari kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Detecting Citizen Problems and Their Locations

Using Twitter Data


Gizem Abalı Enis Karaarslan
Department of Computer Engineering Department of Computer Engineering
Muğla Sıtkı Koçman University Muğla Sıtkı Koçman University
Mugla, Turkey Mugla, Turkey
[email protected] [email protected]

Ali Hürriyetoğlu Feriştah Dalkılıç


Centre for Language Studies Department of Computer Engineering
Radboud University Dokuz Eylul University
Nijmegen, Netherlands Izmir, Turkey
[email protected] [email protected]

Abstract—Twitter is a social network, which contains started to migrate to the cities. The increment in the number of
information of the city events (concerts, festival, etc.), city citizens brought along many city problems like pollution,
problems (traffic, collision, and road incident), the news, feelings accidents, traffics, etc. While changes were occurring, many
of people, etc. For these reasons, there are many studies, which social networking services (like Facebook, Snap chat, etc.)
use tweet data to detect useful information to support the smart which people use to communicate were created and rapidly
city management. In this paper, the ways of finding citizen grew. Users have started using them to not only communicate
problems with their locations by using tweet data is discussed. with the others, but to report the problems of the cities where
Tweets in Turkish language from the Aegean Region of Turkey they live in. Because of this usage of the social platforms, the
were used for the study. It is aimed to form a smart system,
social media analysis became a popular research area in the
which detects problems of citizens and extracts the problems’
urban projects and some research teams were formed to work
exact locations from tweet texts. Firstly, the collected data was
analyzed to get information of any city event, citizen's complaint to create better urban areas. In order to get information about
or requests about a problem. After the possibility of detecting an area, people tend to use many social network platforms.
tweets, which have any city problem, was ensured, two datasets One of these platforms is Twitter. Twitter has 330 million
were created. The first one consists of the tweets that have an monthly active users as of the third quarter of 2017 according
event information or a problem and the second one has the to the statistics of the Statista Statistics Portal. Twitter is
tweets, which have other information not related to our study. different from other social networks; as the main purpose is
Then Naive Bayes classifier was trained on the annotated tweets not to interact with friends; it is formed for sharing and
and was tested on a separate set of tweets. Accuracy, precision, seeking information [1]. Owing to the fact that Twitter is used
recall, and F-measure of the classifier is given. A location by people to proclaim their wishes or to announce problems, it
recognizer, which finds the Turkish place names in a text, is becomes a research tool that can be used to analyze the
created and applied on the tweets that are marked as situation of an area and find the problems.
information-containing by the classifier to detect the location of
the problem precisely. The first findings of the project is II. RELATED WORKS
promising. The high accuracy, which is obtained by the
classifier, shows that it is proper to use this classifier for our When the past studies using Twitter data were studied, it
study. The location recognizer is planned to be improved and was seen that many of them are about analyzing traffic tweets
place names on the real-time tweet data is to be detected. [2-4], some of them were done to understand the happiness in
a city [5], finding the locations of where a user tweets[6],
Index Terms— Data Analysis, Machine Learning, Smart Cities, analyzing disaster tweets [7, 8], and detecting place names that
Social Media Analysis, Text Mining. are passed on a tweet text [9].
Some studies [10, 11] show that, Twitter is the most
I. INTRODUCTION important social network which people use to communicate
In the past years, the technology has developed and with other users and give information to them during a disaster
brought a big change to the lives of people. Many people or a critical event. What's more, it is observed that tweet texts

978-1-5386-4478-2/18/$31.00 ©2018 IEEE

2018 6th International Istanbul Smart Grids and Cities Congress and Fair (ICSG) 30
consist of information about traffics, transportation in a city, IV. IMPLEMENTATION
security, environmental factors in these studies [10, 11]. When
we consider these researches and our aim to form a smart A. Using The Tweet Data For Smart Cities
system in this study together, they show us that Twitter data For this project, we aim to detect the tweets that include a
can also be used in Turkish language to perform our work. problem about a part of a city and find exact location of the
problem. In Figure 1, some tweets that are related to our work
In the next section, basic concepts; smart city concept,
were given. In the first tweet, the user says, “Küçükbakkalköy
tweet data and the usage of machine learning in the tweet data
Beyaz Street, Nergis, Çiğdem and Yenidoğan Avenues are
analysis will be discussed. Then, the methodology and the
closed. We have a hard time. Please immediately help us”. In
implementation will be given with the results. In the last
the second tweet the user says, “Geothermals in Aydın
section, conclusion will be given and possible future works
Germencik and İmamköy harm the agriculture and the
will be discussed.
ecology. We wait for you to come to Aydın”. In the last tweet
consists the text, “It is said that the road is close around
III. BASIC CONCEPTS
Aydın, Muğla, Yatağan tunnel. Please be careful. We have
A. Smart City neither snow tire nor chain in our cars. Don't take a risk”. On
these texts, Muğla, Aydın, Germencik, İmamköy, Yatağan,
In the past years, the numbers of people who live in cities
Küçükbakkalköy, Nergis, Çiğdem and Yenidoğan are the
and rural areas changed and a big increment was observed in
place names. As you can see in the tweets, users talk about
the cities' population. Cities' population count has increased
their needs or requests. Moreover, the common feature of
from 746 million to 3.9 billion from 1950 to 2014 [12] and it
these tweets is that all of them have location information.
is predicted to reach 69% of the world population by 2050
After detecting tweets, which can have critical information, a
[13].
process for finding location information can be applied and
A smart city is still a fuzzy concept. When we explain the the most problematical parts of a city and sometimes
features of smart cities, we can list it as the following [14]: suggestions of the citizens might be found.
• A city that uses smart computing technologies to
create its all infrastructures and services (including
health-care, education, transportation).
• A city that connects its all infrastructures together to
enhance the intelligence.
• A city that watches its all critical transportation,
communication, energy and water infrastructures and
also major buildings.
• A city which is more effective, liveable, fair and
sustainable.
A smart city can be explained as the area where uses
intelligence functions to collect the data and synthesize it to
improve the efficiency of services, equity, sustainability and
quality of life [15].

B. Tweet Data
Tweet data provides details of tweets (including tweet text,
images, videos, retweet counts, favorite counts, location
information, date, time, user name, user screen name, user
picture, font color, ids, reply id if the tweet is a reply to
another tweet and etc.). The data can be obtained by creating a
Twitter application and using a token and access ids.

C. Machine Learning Figure 1. Example tweets which have critical information.

Machine learning is a field of computer science, which


allows computers to learn without be programmed. It has been B. Dataset
used in many research areas (including spam detection, search In this study, we used the tweet data that were collected
engines, optical character recognition). It is seen that machine from a selected area (Aegean region) in Turkey. In order to
learning is also a popular technique that is used in tweet collect data, Twitter API is used. The Twitter API provides
analysis studies [16 – 18] including Twitter sentiment nearly 1% of tweet data that streams from the selected area.
classification, opinion finding, etc. This is the amount, which Twitter gives as free. MongoDB is
used to save the data in JSON format. The continuity of the
data is provided by a Linux script.

2018 6th International Istanbul Smart Grids and Cities Congress and Fair (ICSG) 31
C. Methods
Start
The aim of this project is to detect the tweets that have
critical information and the citizens' requests about the city.
First, we accumulated tweets that have information about the
intended area and also the tweets that do not have any Classify the training
data manually
information to help us. Then, we created two separate datasets
as related ones and not related ones to our work. We trained
the system by using Naive Bayes Classifier to detect
information-containing tweets about the area. We put the
tweets that the classifier found on another collection apart
from related dataset and inserted really information- Collect a new tweet
by using Twitter API
containing tweets into related dataset. In the Figure 2, the
methodology for the classification of tweets is explained.
After obtaining tweets truly related to our study, we applied
Insert the tweet into
location recognizer that we created using Python real_time_tweet
programming language on them to find the related location. dataset

D. Training Data
In order to train new tweet data, we used two datasets that
Is the tweet
were formed as related and not related. Related dataset has information-containing?
tweets that consist of a problem about an area. The dataset (Classify with
named as “Not Related”, consists of tweets that are not related YES Naïve Bayes)
to our study. We used Naive Bayes Classifier on the 100 NO
example tweets that we marked as related (77 tweets) and not
related (23 tweets). The results of the text (TP, FP, FN, TN)
are found according to the confusion matrix in Table 1. Does the tweet
contain any place name? Insert the tweet into
NO
(Apply location not_related dataset
TABLE I. CONFUSION MATRIX recognizer)

Actual Class
Related Not Related YES

Related True Positive (TP) False Positive (FP) Insert the tweet into
Predicted

related dataset
Class

Not Related False Negative (FN) True Negative (TN)


Display the
obtained situation
or problem
In the confusion matrix, “True Positive” stands for
correctly predicted related values, “False Positive” stands for Figure 2. Flow diagram of the tweet classification methodology.
incorrectly predicted related values, “True Negative” stands
for correctly predicted not related values and “False Negative” E. Location Analysis
stands for incorrectly predicted not related values. We found A location recognizer that finds Turkish place names in
that true positive count is 71, false positive count is 8, the given texts was created and applied on tweet texts. If we look
number of true negatives is 15 and the number of false at an example tweet “Efemçukuru Altın Madeni bütün bir
negatives is 6. We calculated the accuracy, precision, recall İzmir'in suyunu kirletiyor” (it means “Efemçukuru Gold Mine
and F-measure values by using the equations (1)-(4). pollutes the water of İzmir”), “İzmir” and “Efemçukuru” that
Accuracy, precision, recall and F-measure values obtained as are included in the tweet text are place names (Izmir is a city
0.86, 0.90, 0.92 and 0.91, respectively. in Turkey and Efemçukuru is a street that is connected to a
town of Izmir). The location recognizer uses a geonames file1
Accuracy = (TP + TN) / (TP + FP + TN + FN) (1) which contains all place names from Turkey (including city
names, town names, street names, etc.). In order to detect
Precision = TP / (TP + FP) (2) place names, language structures were created by using
Recall = TP / (TP + FN) (3) pyparsing [19] and regular expression libraries of Python
programming language. The source code for our location
F-measure = (2 × Precision × Recall) / recognizer can be found in the Bitbucket repository2 [9].
(4)
(Precision + Recall)
1
http://www.geonames.org/
2
https://bitbucket.org/hurrial/placenames/branch/turkish_location_recognizer

2018 6th International Istanbul Smart Grids and Cities Congress and Fair (ICSG) 32
During the process of creating the location recognizer, we REFERENCES
encountered a few problems because of the Turkish language [1] I. L. B. Liu, C. M. K. Cheung, and M. K. O. Lee, "Understanding
word structure. The problems that we have faced are the Twitter usage: What drives people to continue to tweet," in Proc. 2010
following: Pacific Asia Conference on Information Systems (PACIS), pp. 928-939.
[2] N. Wanichayapong, W. Pruthipunyaskul, W. Pattara-atikom, and P.
• Some place names are homophones with common Chaovalit, "Social-based traffic information extraction and
words in Turkish (example; tahta (wood), yağmur classification," in Proc. 2011 International Conference on ITS
(rain), siyah (black), sandık (box), etc.) Telecommunication (ITST), pp. 107-112.
[3] M. Hasby and M. L. Kodra, "Optimal path finding based on traffic
• Some place names are also used as surnames in information extraction from Twitter social-based traffic information,"
Turkish (example; Akalın, etc.). in Proc. 2013 International Conference on ICT for Smart Society
(ICISS), pp. 1-5.
In order to solve first problem, we removed the place [4] S. B. Marupudi, "Framework for semantic integration and scalable
processing of city traffic events," M.Sc. Thesis, Wright State
names that stand for common homophone words from the University, 2016.
place name list. For the second problem, we formed a [5] W. Guo, N. Gupta, G. Pogrebna, and S. Jarvis, "Understanding
language grammar with regular expressions to separate correct happiness in cities using Twitter: Jobs, children and transport," in Proc.
location names from people's surnames. (Example; if we say 2016 IEEE International Smart Cities Conference, pp. 1-7.
that “Metin Akçapınar değerli bir oyuncuydu.” (“Metin [6] Z. Cheng, J. Caverlee, and K. Lee, "You are where you tweet: a
Akçapınar was a precious actor.”), “Akçapınar” is the surname content-based approach to geo-locating Twitter users," in Proc. 2010
ACM International Conference on Information and Knowledge
of the person whose name is Metin but if we say that Management, pp. 759-768.
“Akçapınar, Muğla ilinin Ula ilçesine bağlı bir mahalledir.” [7] T. Sakaki, M. Okazaki, and Y. Matsuo, "Earthquake shakes Twitter
(“Akçapınar is a neighborhood that is connected to Ula users: Real-time event detection by social sensors." in Proc. 2010
District of Muğla Province.”), “Akçapınar” is a neighborhood International Conference on World Wide Web, pp. 851-860.
of a city called as Muğla.) [8] A. Acar and Y. Muraki, "Twitter for crisis communication: Lessons
learned from Japan's tsunami disaster," International Journal of Web
After separating tweets that were marked as information- Based Communities, vol. 7.3, pp. 392-402, 2011.
containing by the classifier, we applied the location recognizer [9] G. Abalı, A. Hürriyetoğlu, and E. Karaarslan, "Event information based
location name analysis in the Twitter data: A preliminary study," in
on these tweets to find the precise related locations. Proc. 2016 International Conference on Computer Science and
Engineering (UBMK).
V. CONCLUSION [10] B. D. M. Peary, R. Shaw, and Y. Takeuchi, "Utilization of social media
in the east Japan earthquake and tsunami and its effectiveness," Journal
In this study, it is discussed how to detect any city problem of Natural Disaster Science, vol. 34(1), pp. 3-18, 2012.
or citizen requests by analyzing the tweet messages which are [11] P. Anantharam, P. Barnaghi, K. Thirunarayan, and A. Sheth,
sent in the city coordinates. The possibility of detecting city "Extracting city traffic events from social streams," ACM Transactions
problems by using the tweet data is found as convenient as the on Intelligent Systems and Technology, vol. 6(4), pp. 43:1–27, July
results of the classifier is promising. In addition, the tweet 2015.
[12] United Nations, "World Urbanization Prospects: The 2014 Revision,
messages related to our work were analyzed and it was Highlights," Department of Economic and Social Affairs, Population
observed that all of them has location information. This Division, 2014.
common feature shows us that the tweet data can be used to [13] A. Barresi and G. Pultrone, "European strategies for smarter cities,"
find the locations of a city where problems occur and it can be Tema, Journal of Land Use, Mobility and Environment, vol. 6(1), pp.
useful to support the city management. The importance of 61-72, 2013.
detecting the location names is seen and a location recognizer [14] H. Chourabi, and et al., "Understanding smart cities: An integrative
framework," in Proc. 2012 Hawaii International Conference on System
is formed. In the future work, it is planned to use the tools Science (HICSS), pp. 2289-2297.
through the real-time tweet data. We also aim to find the [15] M. Batty, and et al., "Smart cities of the future," The European
topics of what citizens talk about with respect to hashtags and Physical Journal Special Topics, vol. 214(1), pp. 481-518, 2012
we believe that these will help in city management. In [16] M. Pennacchiotti and A. M. Popescu, "A machine learning approach to
addition, a web application will be created to show the results. Twitter user classification," in Proc. 2011 International AAAI
Conference on Weblogs and Social Media, pp. 281-288.
[17] A. Z. H. Khan, M. Atique, and V. M. Thakare, "Combining lexicon-
ACKNOWLEDGMENT based and learning-based methods for Twitter sentiment analysis,"
International Journal of Electronics, Communication and Soft
This work is supported by the Scientific Research Project Computing Science & Engineering, vol. 4(4), pp. 89-91, 2015.
Fund of Muğla Sıtkı Koçman University under the project [18] G. Sidorov, and et al., "Empirical study of machine learning based
number 16/160. approach for opinion mining in tweets," in Proc. 2012 Mexican
International Conference on Artificial Intelligence, Springer Berlin
Heidelberg, pp. 1-14.
[19] P. McGuire, Getting started with pyparsing, California: O'Reilly Media
Inc., 2007, p. 65.

2018 6th International Istanbul Smart Grids and Cities Congress and Fair (ICSG) 33

You might also like