Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views23 pages

Majorprojectdoc

The document summarizes a major project on multilingual sentiment analysis submitted by 5 students to fulfill the requirements for a Bachelor of Technology degree. It includes an abstract, introduction discussing existing and proposed systems, literature survey, modules and UML diagrams, implementation details, test cases, conclusion and future scope. The project aims to accurately analyze sentiments expressed in Hindi and English texts using natural language processing techniques and machine learning models trained on annotated datasets in both languages. It also considers the use of emojis in expressing emotions online.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views23 pages

Majorprojectdoc

The document summarizes a major project on multilingual sentiment analysis submitted by 5 students to fulfill the requirements for a Bachelor of Technology degree. It includes an abstract, introduction discussing existing and proposed systems, literature survey, modules and UML diagrams, implementation details, test cases, conclusion and future scope. The project aims to accurately analyze sentiments expressed in Hindi and English texts using natural language processing techniques and machine learning models trained on annotated datasets in both languages. It also considers the use of emojis in expressing emotions online.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

A Major Project with Seminar On

MULTILINGUAL SENTIMENT ANALYSIS


Submitted in partial fulfillment of the requirements for the award of the

Bachelor of Technology
in

Computer Science and Engineering


by
Ashley Edgar Dcunha 20241A05O6
Chelmela Sai Harshith 20241A05P2
Rachapally Pavan Kumar 20241A05S1
Sasidhara Kashyap Chaturvedula 20241A05S5
Vellore Anand Kumar Sahil 20241A05T5

Under the Esteemed guidance of

Ms. K. Anusha
Asst.Professor

Department of Computer Science and Engineering

GOKARAJU RANGARAJU INSTITUTE OF ENGINEERING AND


TECHNOLOGY

(Approved by AICTE, Autonomous under JNTUH, Hyderabad)

Bachupally, Kukatpally, Hyderabad-500090

1
GOKARAJU RANGARAJU INSTITUTE OF ENGINEERING AND
TECHNOLOGY
(Autonomous)

Hyderabad-500090

CERTIFICATE

This is to certify that the mini project entitled “Multilingual sentiment analysis” is submitted
by Ashley Edgar Dcunha(20241A05O6), Sai Harshith Chelmela(20241A05P2),
Rachapally Pavan Kumar(20241A05S1), Sasidhara Kashyap
Chaturvedula(20241A05S5), Vellore Anand Kumar Sahil(20241A05T5) in partial
fulfillment of the award of degree in BACHELOR OF TECHNOLOGY in Computer Science
and Engineering during the Academic year 2023-2024.

Internal Guide Head of the Department

Ms. K.Anusha Dr. B. Sankara Babu

External Examiner

2
ACKNOWLEDGEMENT

Many people helped us, directly and indirectly, to complete our project successfully.
We would like to take this opportunity to thank one and all. First, we would like to
express our deep gratitude towards our internal guide Ms.KAnusha, Department of
Computer Science and Engineering, for his/her support in the completion of our

dissertation. We wish to express our sincere thanks to Dr. B. Sankara Babu, Head

of the Department, and our principal Dr. J. PRAVEEN, for providing the facilities
to complete the dissertation. We would like to thank all our faculty and friends for
their help and constructive criticism during the project period. Finally, we are very
much indebted to our parents for their moral support and encouragement to achieve
goals.

Ashley Edgar Dcunha(20241A05O6)


Sai Harshith Chelmela(20241A05P2)
Rachapally Pavan Kumar(20241A05S1)
Sashidhar Kashyap Chaturvedula(20241A05S5)
Vellore Anand Kumar Sahil(20241A05T5)

3
DECLARATION

We hereby declare that the mini project titled “Multilingual sentiment


analysis” is the work done during the period from 18th July 2023 to 27th
December 2023 and is submitted in the partial fulfillment of the requirements
for the award of the degree of Bachelor of Technology in Computer Science and
Engineering from Gokaraju Rangaraju Institute of Engineering and
Technology (Autonomous under Jawaharlal Nehru Technology University,
Hyderabad). The results embodied in this project have not been submitted to
any other University or Institution for the award of any degree or diploma.

Ashley Edgar Dcunha(20241A05O6)


Sai Harshith Chelmela(20241A05P2)
Rachapally Pavan Kumar(20241A05S1)
Sashidhar Kashyap Chaturvedula(20241A05S5)
Vellore Anand Kumar Sahil(20241A05T5)

4
Table of Contents

Page
Chapter MULTILINGUAL SENTIMENT ANALYSIS No

Abstract 6

1. Introduction 7-8

1.1 Existing System

1.2 Proposed System

2. Literature Survey 9-10

3. Modules and UML Diagrams 11-15

4. Implementation 16-17

5. Test Cases 18

6. Conclusion and Future Scope 19

Appendix
i) Review Paper
ii) Results/Snapshots 20-23

5
Abstract

The user-generated content on social media has made opinion mining an arduous job.
Sentiment analysis is a technique used to analyze the attitude, emotions and opinions of
different people towards anything, and it can be carried out on text in English or Hindi to
analyze public opinion on news, policies, social movements, and personalities. By employing
Machine Learning models, opinion mining can be performed without reading text in English
or Hindi manually. Their results could assist governments and businesses in rolling out policies,
products, and events. Seven Machine Learning models are implemented for emotion
recognition by classifying text in English or Hindi as happy or unhappy. To further validate
stability of the proposed approach on two more datasets, one binary and other multi-class
dataset and achieved robust results.

6
1. Introduction
1.1 Existing System
Social media analysis plays a key role to the understanding of the public’s opinion regarding
recent events and decisions, to the design and management of advertising campaigns as well
as to the planning of next steps and mitigation actions for public relationship initiatives.
Significant effort has been dedicated recently to the development of data analysis algorithms
that will perform, in an automated way, sentiment analysis over publicly available text. Most
of the available research work, focuses on binary categorizing text as positive or negative
without further investigating the emotions leading to that categorization. The current needs,
however, for in-depth analysis of the available content combined with the complexity and
multidimensional aspects of the human emotions and opinions have rendered such solutions
obsolete. Due to these needs, currently, research is focusing on specifying the emotions and not
only the sentiment expressed in a given text. This is, however, a very challenging effort due to
not only the lack of annotated datasets that can be used for emotion detection in text but also
the subjectivity infused in datasets that have been created based on manual annotations. A
hybrid rule-based algorithm is presented in this paper, that supports the creation of a fully
annotated dataset over the Plutchik’s eight basic emotions. The presented algorithm takes into
consideration the available emoji in the text and utilized them as objective indicators of the
expressed emotion thus efficiently tackling both identified challenges.

1.2 Proposed System

In the proposed system for multilingual sentiment analysis in Hindi and English,
we envision a platform that can accurately analyze sentiments expressed in both
languages. This system would employ advanced natural language processing
(NLP) techniques tailored to the linguistic nuances of Hindi and English. It would
leverage large datasets of annotated texts in both languages to train machine
learning models capable of understanding and categorizing sentiments expressed
in diverse contexts. Additionally, the system would take into account the use of
emojis, which play a significant role in expressing emotions in online
communication. By integrating these capabilities, the proposed system aims to
provide businesses and researchers with a powerful tool for gaining insights into

7
the sentiments of users across different linguistic backgrounds, thereby enabling
more effective communication and decision-making.

8
LITERATURE SURVEY
Sentiment analysis inspires corporations to define clients’ preferences about products, services,
and brands. Further, it plays an important role in interpreting information about industries and
corporations to reserve them in making entity review. Sarlan et al. established a sentiment
analysis through extracting number of tweets in English or hindi with the help of prototyping
and the results organized customers’ views via tweets in English or hindi into positive and
negative. Their research divided into two phrases. The first part is based on literature study
which involves the Sentiment analysis techniques and methods that nowadays are used. In the
second part, the application necessities and operations are described preceding to its
development. In another research Alsaeedi and Zubair Khan analyzed various kinds of
sentiment analysis that is applied on to Twitter dataset and its conclusions. The distinct
approaches and conclusions of algorithm performance were compared. Methods were used
which were supervised ML based,, lexicon-based, ensemble methods. Authors used four
methods that were Twitter sentiment Analysis using Supervised ML Approaches; Twitter
sentiment Analysis using Ensemble Approaches. Twitter sentiment Analysis is using lexicon
based Approaches. Lexicon based approaches have been explored by many researchers for
emotion classification. Bandhakavi et al. performed emotion-based feature extraction using
domain specific lexicon generation. They captured association of words and emotions using a
unigram mixture model. They used tweets in English or hindi that are weakly labelled to
classify emotions. Their proposed architecture outperformed other state-of-theart approaches
such as Latent Dirichlet Allocation and Point wise Mutual Information. Event related tweets in
English or hindi are identified by researchers on geo related tweets in English or hindi. They
used specific tweets in English or hindi of local festivities in one year. They also identified
different parameters that helped in event discovery. Alsinet et al. [6] analyzed tweets in English
or hindi from political domains. They claimed accepted tweets in English or hindi are stronger
as compared to the rejected tweets in English or hindi. Rumor detection in tweets in English
or hindi is performed by using an encoder to analyze human behavior in comments .
Hakh et al. used SMOTE method to remove excessive challenges of Twitter dataset. In
addition, they applied different feature selections for rapidity of sentiment analysis method.
Authors projected methodology that was estimated beside the dataset application decision,
squashy favorable results on all operated evaluation metrics. Pre-processing steps were applied
on their dataset after that they used TF-IDF features that were used to measure important
weight of terms. Then classification methods were used (i.e. AdaBoost, Linear SVM, Kernel

9
SVM, Random Forest, Decision Tree, Naïve Bayes and K-NN) and at last to relate
classification’s effectiveness: Accuracy and F1-score measures were used.
In , Xia et al. created the proportional training of the efficiency about collaborative method on
behalf of Sentiment’s arrangement. They set two types of feature in the context of sentiment
analysis. Firstly, the feature set was totally depend on the part of speech and word relation was
depending on the feature set. Secondly, the following familiar text in English or hindi
classification algorithms that were maximum entropy,support vector machines and naive
Bayes. Thirdly, the following ensemble strategies, that was the fixed combination, meta-
classifier combination and weighted combination. They used 5 document-level datasets
broadly utilized along with arena of Sentiment’s arrangement. Experiments shown in this
research the ensemble techniques are more effective than rest of the classifier which is also
shown in our search that ensemble of two classifiers that are Logistics regression and stochastic
gradient decent classifiers ensemble and give better result than other classifiers.
Deep learning has been utilized by many researchers for image classification and tweet in
English or hindi classification . Rustam et al. presented a Tweets in English or hindi
Classification for US Airline Companies Sentiments. The researcher applied pre-processing on
the dataset. The influence about feature extraction methods, together with TF, TF-IDF, along
with word2vec, proceeding the classification accuracy has been examined. In addition,
execution about the long short-term memory (LSTM) was studied in certain dataset. Paper of
researcher proposes a Voting Classifier (VC) who helps to process similar administrations.
Voting Classifier must dependent the Spatial Estimation (SE), Stochastic Gradient Descent
classifier (SGDC) along with simple ensemble method for concluding results. Various types of
ML classifiers tested with the use of precision, accuracy, recall and F1-score by way of working
metrics. Results indicate that proposed VC is more efficient than one of the phase actors. The
experiment also demonstrated the efficiency of machine learning students improved while TF-
IDF utilizes a feature input.

10
3. Modules and UML Diagrams
Data Collection and Preprocessing: This module would be responsible for gathering a large
dataset of text samples in both Hindi and English from various sources such as social media,
news articles, and product reviews. It would also involve preprocessing the data to clean and
standardize the text, including tasks like tokenization, stemming, and removing stop words.

Language Identification: Since the system needs to handle both Hindi and English texts, a
language identification module would be essential to determine the language of each input text.
This module would help route the text to the appropriate analysis pipeline for further
processing.

Sentiment Analysis Model Training: This module would involve training machine learning
models for sentiment analysis in both Hindi and English. It would require a large labeled
dataset for each language to train accurate models. Techniques like word embeddings and deep
learning architectures such as recurrent neural networks (RNNs) or transformers could be used
for this purpose.

Emoji Analysis: Emojis play a crucial role in expressing sentiment, especially in online
communication. A module dedicated to emoji analysis would be responsible for identifying
and interpreting emojis in the text and incorporating them into the sentiment analysis process.

Integration and Deployment: Once the individual modules are developed, they need to be
integrated into a cohesive system. This module would handle the integration of the different
components and the deployment of the system, ensuring that it can handle real-time analysis
of texts in both Hindi and English.

Evaluation and Improvement: Continuous evaluation of the system's performance is crucial


for improvement. This module would involve monitoring the system's accuracy, identifying
areas for improvement, and updating the models and algorithms based on new
data and feedback.

11
USE CASE DIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. Its purpose is to present a graphical overview
of the functionality provided by a system in terms of actors, their goals (represented as use
cases), and any dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the actors in the
system can be depicted.

12
ACTIVITY DIAGRAM

Activity diagrams are graphical representations of work flows of stepwise activities and
actions with support for choice, iteration and concurrency. In the Unified Modeling Language,
activity diagrams can be used to describe the business and operational step-by-step work flows
of components in a system. An activity diagram shows the overall flow of control.

13
SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagram

14
CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes. It
explains which class contains information.

15
4. Implementation

16
17
5. Test Cases

Input: "I am feeling bahut khush today. 😄"

Expected Output: Positive sentiment (indicating that the user is feeling very happy), with the
😄 emoji reinforcing the positive sentiment.

Input: "Mujhe yeh pasand hai, but I don't like the ending. 😕"

Expected Output: Mixed sentiment with a positive sentiment for the Hindi part indicating that
the user likes something, and negative sentiment for the English part indicating that the user
doesn't like something, further emphasized by the 😕 emoji indicating slight disappointment.

Input: "Today maine bahut kaam kiya, and now I'm exhausted. 😫"

Expected Output: Mixed sentiment with potentially positive sentiment for the Hindi part
indicating that the user did a lot of work, and negative sentiment for the English part indicating
that the user is exhausted, reinforced by the 😫 emoji indicating tiredness or exhaustion.

18
6. Conclusion and Future scope

Conclusion

In conclusion, the proposed system for multilingual sentiment analysis in Hindi and English

aims to create a powerful tool for understanding people's feelings expressed in these languages.

By using advanced technology like machine learning and natural language processing, the

system can analyze large amounts of text from sources like social media and news articles. This

analysis can help businesses understand their customers better and make decisions based on

people's opinions and emotions. The system's ability to handle both languages and interpret

emojis makes it a valuable resource for understanding sentiments across different cultures and

languages. With continuous improvement and updates, this system can become even more

accurate and useful in the future.

Future Scope

The future of multilingual sentiment analysis looks promising as technology advances. More

people are using the internet and social media in different languages, making it important to

understand sentiments expressed in various languages. In the future, we can expect improved

algorithms and models that can understand and analyze sentiments across languages more

accurately. This could help businesses understand their global customers better, improve

customer service, and even aid in cross-cultural communication. Additionally, there might be

advancements in analyzing sentiments expressed through emojis, which are used widely across

different languages and cultures.

19
Appendix:
i) Review Paper:

20
'

21
22
23

You might also like