Indjcse21 12 05 018

The paper conducts sentiment analysis and topic modeling on Twitter data related to the 'Clean India Mission' using Latent Dirichlet Allocation (LDA) to identify trending topics and sentiments. It analyzes 2209 tweets collected from various hashtags, employing clustering techniques to reveal relationships among topics and using lexicon-based classification for sentiment evaluation. The study aims to uncover public opinions and enhance understanding of the mission's impact through data-driven insights.

Uploaded by

mnu247.sidhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views10 pages

Indjcse21 12 05 018

Uploaded by

mnu247.sidhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

e-ISSN : 0976-5166

p-ISSN : 2231-3850 Sangeeta Rani et al. / Indian Journal of Computer Science and Engineering (IJCSE)

SENTIMENT ANALYSIS AND TOPIC

MODELLING ON TWITTER FOR
CLEAN INDIA MISSION
Sangeeta Rani
Research Scholar
Department of Computer Science and Application
Maharishi Dayanand University, Rohtak, Haryana, India
[email protected]
Nasib Singh Gill
Professor
Department of Computer Science and Application
Maharishi Dayanand University, Rohtak, Haryana, India
[email protected]
Preeti Gulia
Assistant Professor
Department of Computer Science and Application
Maharishi Dayanand University, Rohtak, Haryana, India
[email protected]
Abstract
Twitter is an important source of information but it is challenging to analyze this data in order to recover
meaningful inference. The present paper uses topic modelling and sentiment analysis to draw useful context
from Twitter data set related to ‘Clean India Mission’. Latent Dirichlet Allocation is used in the research
to identify twenty most trending topics and top seven terms related to each of the twenty topics. Coherence
and prevalence values represent model efficiency. Topic clustering is also used in the research to identify
how strongly topics are related to each other. Five different clusters are created from the top trending
topics reflecting different aspects in the corpus. The average silhouette width is employed to determine the
optimal number of clusters. Lexicon based classification using ‘nrc’ sentiment directory is also used to
reflect people’s sentiment at ten different sentiment levels for the mission. Twitter data for the research is
collected from seven different Hashtags, including the official page of the clean India campaign. The most
relevant subject segments are identified after evaluating the trending topics by utilizing topic coherence
value.
Keywords: Twitter sentiment analysis; Topic Modelling; LDA; Opinion Mining; Clean India Mission, Topic
Clustering.
1. Introduction
Twitter is one of the most widely used social media platforms, with millions of users across the world to share
their opinion on diverse issues related to various products and services regarding health care, politics, news, sports,
education, government, pubic polices, natural disaster and many more. A huge data repository is created and it is
quite challenging to find out useful context from this hybrid data. Twitter opinion mining is a way to transform
semi structured twitter data to more structured from to find out sentiment attached with the tweets [Agarwal et al.,
(2011) and Maheshwari et al., (2019)]. It aids in determining what people think about a specific product or issue,
which in turn aids the relevant company or organization in improving their service or product based on the input
obtained. The perception of individuals is computed using a variety of automated machine learning and sentiment
analysis techniques. Sentiment analysis is carried out using various supervised and unsupervised classification
techniques [Ray (2017) and Kurnaz et al., (2019)].
Topic modelling is also a useful way to uncover the hidden semantics context in tweets and for detailed analysis.
A topic is a group of words co-occurring in multiple documents, related to the same context. It is a rapidly
developing branch of text mining that can be applied to twitter data for more elaborate text analysis. Topic
modelling is a way to find out the group of words, which are expected to appear in the corpus and best reflects the
context of the corpus. In topic modelling we are more concerned about long-range context like in the case of n-
grams and local dependencies. It aids in the discovery of latent semantic structure [Kherwa et al., (2018) and