Sentiment Analysis Application Using Flask, and
XGBoost
Aaditya Patil (MITU22BTCS0003)∗
Onkar Gidde (MITU22BTCS0518)†
Harsh Bhadre (MITU22BTCS0309)‡
∗ Email: [email protected]
† Email: [email protected]
‡ Email: [email protected]
Abstract—This paper details the development of a sentiment TABLE I
analysis application leveraging machine learning algorithms, C OMPARISON OF S ENTIMENT A NALYSIS T ECHNIQUES
specifically XGBoost, integrated with a modern web framework
comprising Flask and Streamlit. The app classifies Amazon Method Accuracy Complexity Scalability
reviews into positive, negative, and neutral sentiments. The archi- Rule-based Moderate Low Low
tecture focuses on ease of use, scalability, and accuracy. Results SVM High Moderate Moderate
XGBoost Very High High High
demonstrate high accuracy and user engagement, highlighting
Deep Learning Very High Very High High
the app’s potential for real-world deployment. This research
also discusses challenges faced during development, such as
preprocessing noisy data and deploying machine learning models
on web platforms. III. P ROPOSED M ETHODOLOGY
Index Terms—Sentiment Analysis, Machine Learning, Natural A. System Architecture
Language Processing
The application is divided into three layers:
I. I NTRODUCTION 1) Frontend: Built with HTML, CSS, and JavaScript,
Sentiment analysis is a key application of natural language providing an intuitive user interface.
processing (NLP) aimed at interpreting the sentiment conveyed 2) Backend: Flask handles API calls and serves the ma-
in textual data. It is extensively used in e-commerce, social chine learning model.
media analysis, and customer feedback systems. This paper 3) Model: XGBoost classifier trained on Amazon reviews
introduces a novel application that classifies sentiments of dataset.
Amazon reviews using an XGBoost model, with a user- Figure 1 illustrates the architecture of the system.
friendly interface powered by Flask and Streamlit. The aim
is to provide an intuitive tool for real-time sentiment analysis,
enabling better decision-making for businesses.
A. Problem Statement
Analyzing sentiment manually from large datasets is labor-
intensive and prone to errors. Automating this process with
machine learning reduces time and ensures accuracy.
B. Objectives
• Develop a scalable sentiment analysis model.
• Integrate the model into an interactive web-based plat-
form.
• Evaluate model performance and identify areas of im-
provement.
II. R ELATED W ORK
This section reviews various approaches to sentiment anal-
Fig. 1. System Architecture
ysis, from rule-based systems to modern machine learning
and deep learning methods. A comparative analysis of these
techniques highlights the advantages of XGBoost for handling B. Data Preprocessing
structured data efficiently. Table I summarizes existing work • Text cleaning: Removal of punctuation, special charac-
in the field. ters, and stop words.
• Tokenization: Breaking text into tokens for vectorization. ACKNOWLEDGMENT
• Feature extraction: Using CountVectorizer for numerical The author thanks contributors to the open-source tools
representation. utilized in this project.
• Scaling: Applying Scaler to normalize features for opti-
mal model performance. R EFERENCES
C. Model Training [1] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,”
in Proceedings of the 22nd ACM SIGKDD International Conference on
The XGBoost model was trained using grid search for Knowledge Discovery and Data Mining, 2016.
hyperparameter tuning. Algorithm 1 describes the training [2] Scikit-learn: Machine Learning in Python. Available: https://scikit-learn.
org/.
process. [3] Streamlit: The fastest way to build data apps. Available: https://streamlit.
io/.
Algorithm 1 XGBoost Training Algorithm [4] Flask: Web Development, One Drop at a Time. Available: https://flask.
palletsprojects.com/.
1: Input: Training data (X, y)
2: Initialize parameters: learning rate, max depth, etc.
3: for each boosting round do
4: Train base learners
5: Combine learners to minimize error
6: end for
7: Output: Trained XGBoost model
IV. I MPLEMENTATION D ETAILS
A. Frontend
Streamlit is used for its simplicity and ability to render
results dynamically.
B. Backend
Flask provides REST API endpoints for model inference.
The system is containerized using Docker for deployment.
C. Deployment
The app is deployed on a cloud service (e.g., AWS or
Heroku). The deployment process involves:
1) Packaging the application using Docker.
2) Setting up a virtual server.
3) Configuring environment variables for seamless API
interaction.
V. R ESULTS AND A NALYSIS
Table II presents the performance metrics of the XGBoost
model.
TABLE II
M ODEL P ERFORMANCE M ETRICS
Metric Value
Accuracy 92%
Precision 91%
Recall 93%
F1 Score 92%
VI. C ONCLUSION
This paper presented the design and implementation of a
sentiment analysis app. The results demonstrate its potential
as a reliable tool for real-time sentiment analysis. Future work
includes:
• Incorporating deep learning models like BERT.
• Expanding support for multi-language sentiment analysis.
• Improving scalability for large datasets.