Synopsis
BHAI PARMANAND DSEU
SHAKARPUR CAMPUS-II
Tweet Sentimental
Analysis
SEMESTER 2
(MCA 2022-2024)
SUBMITTED TO: SUBMITTED BY:
DEEPAK SHARMA Sarthak Bhardwaj
ROLL NO: 50122028
“Tweet Sentimental Analysis”.
Problem statement
1.1 Brief Description of the System under Study:
This proposal is a web application which is used to analyze the tweets. We will be performing
sentimental analysis in tweets and determine, whether it is positive, negative or neutral. This web
application can be used by any organization office to review their works or by political leaders or by any
company to review about their products or brands. The main feature of the web application is that it helps
to determine the opinion about the peoples on products, government work, politics or any other. By
analyzing the twins, a system is capable of training the new tweets, taking reference to previously trained
to it's the computer or computed or analyzed data will be represented in various diagrams such as pie
chart. And word cloud.
1.2 About the proposed System:
The objective of the proposed analysis, “Tweet Sentimental Analysis”, is the analysis of the enormous
amount of data easily available from social media. This project will be helpful to the companies, political
parties as well as to the common people. It will be helpful to the political party for reviewing about the
program that they are going to do or the program that they have performed. Similarly, companies also can
get review about their new product or newly released hardware or software. Also, the movie maker can
take review on the currently running movie. By analyzing, the tweet analyzer can get result on how
positive or negative or neutral are peoples about it.
1.3 Characteristics of the proposed system:
Sentiment analysis can be defined as a process that automates mining of attitudes, opinions,
views and emotions from text, speech, tweets and database sources through Natural Language
Processing (NLP).
Sentiment analysis involves classifying opinions in text into categories like “positive” or “negative” or
“neutral”. It’s also referred as subjectively analysis, opinion mining, and appraisal extraction. The words
opinions, sentiment, view and belief are used interchangeably but there are differences between them.
▪ Opinion: A conclusion open to dispute (because different experts have different
opinions) ▪ View: Subjective opinion
▪ Belief: Deliberate acceptance and intellectual
assent ▪ Sentiment: Opinion representing one’s
feelings.
Sentiment Analysis is a term that include many tasks such as sentiment extraction,
sentiment classification, subjectivity classification, summarization of opinions spam
detection, among others.
It aims to analyze people’s sentiments, attitudes, opinions, emotions, etc. towards
elements such as, products, individuals, topics, organizations and services.
2. Feasibility Study Report
2.1 Technical:
Hardware availability:
Minimum Requirements
Processor: Intel (R) Core (TM) I5 CPU 2GHz
Memory: 512 MB
Hard Disk: 16 GB
Input Device: Keyboard, Mouse
Software availability:
Minimum Requirements
Language: Python, ML
Operating System: Linux, Windows
Tools: PyCharm, Python 3.8.0, Command Prompt
3. DFD DATA FLOW DAIGRAM
Figure 3.1: Block Diagram
Physical design:
Figure 3 : Flow of Working
1. Problem Definition:
Depression is classified as a mood disorder. It can be defined as a feeling of
sadness, loss or anger that interfere with a person’s everyday activities. It has
become a common disorder nowadays. People having depression can suffer from
chronic heatlh conditions.
Social media is the platform where people communicate and share their
feelings with others. Social media platforms like Facebook, Twitter, Instagram are
not only source of multimedia contents but also helps people to express
themselves, their sentiments and emotions through posts and comments.
It also provides opportunity to people to discuss and freely contribute on any
topic online which can help health sector people to get insight of what might be
happening at mental state of someone according to their reaction on a particular
topic.
2. Objectives & Scope:
The objective of the proposed analysis, “Tweet Sentimental Analysis”, is the
analysis of the enormous amount of data easily available from social media.
This project will be helpful to the companies, political parties as well as to the
common people. It will be helpful to the political party for reviewing about the
program that they are going to do or the program that they have performed.
Similarly, companies also can get review about their new product or newly
released hardware or software. Also, the movie maker can take review on the
currently running movie. By analyzing, the tweet analyzer can get result on how
positive or negative or neutral are peoples about it.
3. Methodology:
3.1. Methodology for data collection:
Firstly, data will be cleaned through Exploratory Data Analysis to
get more accuracy of the model and then visualized to get more
understanding about representation of the data.
From the cleaned data, Model will be trained through Named Entity
Recognition (NER) technique, which is a subpart of Natural Language
Processing (NLP) for getting common word or phrases (like good, happy,
enjoying for positive type of words or fear, bad, worries for negative type of
tweets) of a whole comment/tweet which helps to make understand that
whether message is related to positive response or negative response.
In the whole process, we’ll get some linguistic words like articles (‘a’, ‘am’,
‘the’), prepositions (‘for’, ‘from’, ‘in’, ‘of’), personal pronoun (‘I’, ‘them’, ‘her’),
impersonal pronoun (‘it’, ‘its’), auxiliary verbs (‘do’, ‘have’), conjunctions (‘and’,
‘but’) and negation (‘not’, ‘never’) that’ll not be essential for model and
decrease accuracy. These words will be removed as it shows neither positive
nor negative sentiments. These types of words are common in each type of
sentences.
3.2. The techniques & tools proposed to be used for systems analysis, design, testing
and development of software.
Python libraries and approaches used in the project are:
3.2.1. NumPy:
NumPy is a python library used for working with arrays. It also has
functions for working in linear algorithm, fourier transform end matrices,
NumPy stands for Numerical Python. In Python, we have lists that serve the
purpose of arrays, but they're slow to process. NumPy aims to provide an
array object that is up to 50x faster than traditional Python lists.
3.2.2. Pandas:
Pandas is an open-source Python library providing high-performance, data
manipulation and analysis tool using its powerful data structure.
Prior to Pandas, Python was majorly used for data mining and
preparation. It had very little contribution towards data analysis. Pandas
solved this problem. Using pandas, we can accomplish 5 typical steps in the
processing and analysis of data, regardless of the origin of the data - load
prepare, manipulate, model, and analyze.
3.2.3. SEABORN:
Seaborn is a library for making statistical graphics in Python. It is built on
top of matplotlib and closely integrated with Pandas data structures.
Seaborn aims to make visualization a central part of exploring and
understanding data. Its dataset-oriented plotting functions operate on data
frames and arrays containing whole datasets and internally perform the
necessary semantic mapping and statistical aggregation to produce
informative plots.
3.2.4. Data Cleaning:
Data cleaning is the process of preparing data for analysis by removing or
modifying the data that is incorrect, incomplete, irrelevant, duplicated, or
improperly formatted. This data is usually not necessary or helpful when it comes
to analyzing data because it may hinder the process of provide inaccurate results.
Most importantly, the goal of data cleaning is to create data sets that are
standardized and uniform to allow data analytics tools to easily access and find
the right data for each query.
3.2.5. Data Preprocessing:
In any Machine Learning process, Data Preprocessing is that start step in
which the
data gets transformed, or encoded, to bring it to such a state that now the
machine can easily praise it. In other words, the features of the data can now
be easily interpreted by the algorithm.
3.2.6. Data Visualization:
Data visualization is the graphical representation of information and data.
By using visual elements like charts, graphs, and maps, data visualization tools
provide an accessible way to see and understand trends, outliers, and patterns
in data. In the world of Big Data, data visualization tools and technologies are
essential to analyze the massive amount of information and make data-driven
decisions.
3.2.7. Jaccard Similarity Score:
Jaccard similarity score is defined as the size of intersection divided by the
size of the union of two label sets, is used to compare set of values of two
vectors.
3.2.8. Named Entity Recognition (NER):
Named entity recognition (NER) (also known as (named) entity identification,
entity chunking and entity extraction) is a subtask of information extraction that
seeks to locate and classify named entities mentioned in unstructured text into
predefined categories such as person names, organizations, locations, medical
codes, time expressions, quantities, monetary values, percentages, etc.
Most research on NER systems have been structured as taking in non-
annotated block of text. We will be using spacy for creating our own customized
NER model or models (separate for each sentiment)