C OU R SE N O : C SE 4 1 1 4
Cou rse Tit le :Pat t e rn R e cogn it ion an d Mach in e L e arn in g
A Comprehensive Study to Sentiment Analysis of Bangla
Cricket-Related Social Media Comments Using ML and LSTM
Models
Research Participants
Adibul Haque Yousuf Ali Miftahul Sheikh
ID: 20200204029 ID: 20200204037 ID: 20200204038
Slide 02
Research Paper Presentation
Outline
01 Abstract 07 EVALUATION METRICS
02 Introduction 08 RESEARCH GAP
03 LITERATURE REVIEW 9 CONCLUSION
04 DATASETS 10 CONTRIBUTION OF GROUP MEMBERS
05 PRE-PROCESSING TECHNIQUES 11 REFERENCES
06 MODELS
Slide 03
ABSTRACT
• Sentiment analysis of Bangla cricket-related
social media comments.
• Logistic Regression, KNN, and LSTM models
applied.
• Text normalization, tokenization, and word
embeddings on Facebook and YouTube
comments.
• KNN: 72.1%, Logistic Regression: 70.1%, LSTM:
77.6%.
• ML + DL boost Bangla sentiment analysis;
future: expand dataset, explore hybrids.
Slide 04
INTRODUCTION
• Increased social media use boosts cricket
discussions in Bangladesh.
• Essential to understand public sentiment on
cricket in Bangladeshi culture.
• Lack of Bangla sentiment analysis in cricket
context.
• Aims to bridge the gap in Bangla sentiment
analysis for cricket-related social media
comments.
• Combines traditional and deep learning
techniques to enhance sentiment analysis
accuracy.
Slide 05
Motivation
• Aims to accurately interpret fan sentiments.
• Employs NLP and ML to handle Bangla language
specifics.
• Addresses unique Bangla linguistic challenges
• Applies advanced methods for deeper analysis.
• Enhances understanding of Bangladeshi cricket
fans' opinions.
Slide 04
LITERATURE REVIEW
• The paper analyzes Bangla movie reviews for sentiment.
EVALUATION OF NA¨ IVE BAYES
• It uses Naive Bayes (NB) and Support Vector Machines
AND SUPPORT VECTOR MACHINES
(SVM) for polarity detection.
ON BANGLA TEXTUAL MOVIE
• SVM, with stemmed unigram features, achieved a
REVIEWS.
precision of 0.86.
• 82.20% for abusive Bengali text detection.
A DEEP LEARNING APPROACH TO • Outperformed ANN (81.10%), LinearSVC (75.70%), Logit
DETECT ABUSIVE BENGALI TEXT. (75.20%), MNB (73.90%), and RF (70.50%).
• LSTM > other models.
Slide 06
LITERATURE REVIEW
• The study used 57,000 Bangla news items to identify
A STUDY TOWARDS BANGLA FAKE fake news.
NEWS DETECTION USING MACHINE • Bi-LSTM models with GloVe and FastText achieved up to
LEARNING AND DEEP LEARNING. 96% accuracy.
• GRU model accuracy was 77%.
• RNN with LSTM for Bangla cricket sentiment analysis.
CRICKET SENTIMENT ANALYSIS FROM • The LSTM model achieves an accuracy of 95%
BANGLA TEXT USING RECURRENT • LSTM outperforms the Support Vector Machine (SVM),
NEURAL NETWORK WITH LONG SHORT which has an accuracy of 71.03%
TERM MEMORY MODEL.
Slide 07
DATASETS
• Paper [1]: Utilized phishing • Paper [5]: 10,000 URLs from
dataset with 11,000 URLs and 30 Kaggle, balanced phishing and non-
features. phishing.
• Paper [2]: Real-world phishing
data used, unspecified source.
• Paper [3]: CIC-Bell-DNS 2021 with
400,000 benign and 13,011
malicious samples; UCI Phishing
Domains and 3,000 URLs.
• Paper [4]: Real-world website
details, no specific dataset
provided.
Slide 09
PRE-PROCESSING TECHNIQUES
01 02 03 04
D ATA F E AT U R E F E AT U R E N O R M A L I Z AT I
CLEANING EXTRACTION SELECTION ON AND
SCALING
05 06 07 08
D ATA BALANCING D ATA A D VA N C E D
ENCODING D ATA SPLITTING TECHNIQUES
Slide 10
MODELS USED
CLASSIFIER ACCURACY PRECISION RECALL
LINEAR REGRESSION GFG STANDARD PROFESSIONAL
SVM 0.7214 0.6852 0.7215
RANDOM FOREST 0.7065 0.6754 0.7012
KNN 0.7114 0.6814 0.7114
Slide 11 XGBOOST 0.7449 0.7350 0.7450
Evaluation Metrics
• Accuracy
• Precis ion
• Recall/S ens itivity
• F-measure
• Error Rate (ERR)
• Fals e Pos itive Rate ( FPR)
• Specifi city
• Detection S peed ( DS)
Slide 12
RESEARCH GAP
Paper [1]: Paper [4]:
1 The HEFS method is slow for real-time 4 The system struggles with new phishing
detection in resource-limited environments techniques and targeted attacks, has
and lacks thorough testing against privacy concerns, and requires more
diff erent phishing types and false alarms. research and teamwork to improve
accuracy with feedback and context
Paper [2]:
2 Paper [5]:
The study does not test the model against 5 The model’s eff ectiveness depends heavily
various phishing types or discuss real-world
deployment challenges, and its embedding on data, may miss some phishing threats,
techniques may not capture all phishing and does not show signifi cant benefi ts of
variations. using ANN and AdaBoost together over ANN
alone.
Paper [3]: Paper [6]:
3 The GNN models need better accuracy and 6 The study relies on outdated data and
adaptation to new phishing tactics, lacks comprehensive testing against
focusing mainly on URL structures and various phishing techniques, potentially
requiring signifi cant computing power. limiting its practical applicability and
eff ectiveness.
Slide 13
CONCLUSION
1 Extensive research of ML
techniques.
Random Forest and Neural Networks are
2 highly accurate.
Feature engineering and
3 preprocessing are crucial.
4 Larger datasets and real-world tests
are needed.
The study suggests future cybersecurity
5 improvements.
Slide 14
Related
Papers
Nayan Banik and Md Hasan Hafizur Rahman. Evaluation Elias Hossain, Md Nadim Kaysar, Abu Zahid Md Jalal
01 of na¨ ıve bayes and support vector machines on bangla Uddin Joy, MdMizanur Rahman, and Wahidur Rahman. A
textual movie reviews. In 2018 international conference 03 study towards bangla fake news detection using machine
on Bangla speech and language processing (ICBSLP), learning and deep learning. In Sentimental Analysis and
pages 1–6. IEEE, 2018. Deep Learning: Proceedings of ICSADL 2021, pages 79–
95. Springer, 2022.
Estiak Ahmed Emon, Shihab Rahman, Joti Banarjee, Amit Md Ferdous Wahid, Md Jahid Hasan, and Md Shahin Alom.
Kumar Das, and Tanni Mittra. A deep learning approach to Cricket sentiment analysis from bangla text using
02 detect abusive bengali text. In 2019 7th International 04 recurrent neural network with long short term memory
Conference on Smart Computing & Communications model. In 2019 International Conference on Bangla
(ICSCC), pages 1–5. IEEE, 2019. Speech and Language Processing (ICBSLP), pages 1–4.
IEEE, 2019.
Slide 15
CONTRIBUTION OF GROUP MEMBERS
Wr i ti n g Rep or t Prep arin g
Pap er P resen tation
Abstract, Introduction, Adib
Yousfu Ali
Conclusion.
Adibul Literature Review,
Mifta
Haque Datasets, References
Pre-processing
Miftahul
Techniques, Models, Nafisa
Sheikh
Evaluation Metrics
Slide 16
THANK YOU