Algorithms 18 00331
Algorithms 18 00331
Article
Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Av. Juan de Dios Batiz, s/n,
Mexico City 07320, Mexico; [email protected] (A.N.); [email protected] (M.Z.);
[email protected] (N.H.); [email protected] (A.Q.); [email protected] (F.A.);
[email protected] (A.G.)
* Correspondence: [email protected]; Tel.: +52-55-9188-7293
† These authors contributed equally to this work.
Abstract: Hate speech is a complex topic that transcends language, culture, and even
social spheres. Recently, the spread of hate speech on social media sites like Facebook
has added a new layer of complexity to the issue of online safety and content moderation.
This study seeks to minimize this problem by developing an Arabic script-based tool
for automatically detecting hate speech in Roman Urdu, an informal script used most
commonly for South Asian digital communications. Roman Urdu is relatively complex as
there are no standardized spellings, leading to syntactic variations, which increases the
difficulty of hate speech detection. To tackle this problem, we adopt a holistic strategy
using a combination of six machine learning (ML) and four Deep Learning (DL) models,
a dataset from Facebook comments, which was preprocessed (tokenization, stopwords
removal, etc.), and text vectorization (TF-IDF, word embeddings). The ML algorithms
used in this study are LR, SVM, RF, NB, KNN, and GBM. We also use deep learning
architectures like CNN, RNN, LSTM, and GRU to increase the accuracy of the classification
further. It is proven by the experimental results that deep learning models outperform
the traditional ML approaches by a significant margin, with CNN and LSTM achieving
Academic Editors: Rafal Rzepka, accuracies of 95.1% and 96.2%, respectively. As far as we are aware, this is the first work
Michal Ptaszynski and Pawel Dybala that investigates QLoRA for fine-tuning large models for the task of offensive language
Received: 28 April 2025 detection in Roman Urdu.
Revised: 25 May 2025
Accepted: 26 May 2025 Keywords: deep learning; machine learning; support vector machine
Published: 1 June 2025
to recognize and filter harmful content [2]. While substantial progress has been made in
English-language hate speech detection, low-resource languages such as Roman Urdu
remain largely underexplored due to the lack of annotated datasets, linguistic complexity,
and informal writing styles [3].
Roman Urdu, a Latin-script representation of the Urdu language, presents unique
challenges in hate speech detection. Unlike standard languages with well-defined grammar
and structure, Roman Urdu lacks orthographic norms, meaning that the same word can
be spelled in multiple ways (e.g., “mujhe” vs. “mujay” for “me”) [4]. Additionally, code-
mixing with English, phonetic variations, and informal syntax add complexity to text
classification models [5]. Traditional Natural Language Processing (NLP) techniques
struggle to handle such variations, necessitating more robust approaches using ML and DL.
Several ML-based hate speech detection systems have been developed using classifiers such
as Random Forest (RF), Naïve Bayes (NB), Support Vector Machines (SVMs), k-Nearest
Neighbors (KNNs), Logistic Regression (LR), and Gradient Boosting Machines (GBMs) [6].
These models rely on handcrafted features, including n-grams, TF-IDF (Term Frequency-
Inverse Document Frequency), and word embeddings to recognize hate speech. While
effective in some cases, these approaches often fail to capture semantic meaning, sarcasm,
and implicit hate speech [7].
Deep learning models have demonstrated superior performance in hate speech classifi-
cation due to their ability to learn complex linguistic patterns. Gated Recurrent Units
(GRUs), Long Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Short-Term Memory (LSTM) are among the most commonly used architec-
tures for text classification tasks [8]. These models leverage word embeddings (Word2Vec,
FastText, and GloVe) to capture contextual relationships, making them more effective than
traditional ML approaches in understanding hate speech nuances [9].
While ML models are computationally efficient and interpretable, they struggle to
capture deep contextual information. DL models, on the other hand, excel at learning
complex patterns but require large annotated datasets and high computational power.
Studies have shown that CNNs and LSTM models outperform traditional ML models,
achieving accuracy scores of 95% and 96%, respectively, in hate speech detection tasks [10].
However, a hybrid approach combining both ML and DL models has shown promising
results in recent research [11].
Hate speech detection is inherently subjective, as definitions of offensive content
vary across cultures and contexts [12]. Automated systems must be carefully designed
to avoid biases, false positives, and over-censorship while ensuring fair and accurate
classification [13]. Ensuring transparency in AI-based hate speech detection models is
crucial for building user trust and compliance with ethical guidelines [14].
A major limitation in Roman Urdu hate speech detection research is the lack of publicly
available annotated datasets [15]. Most existing datasets are either small, imbalanced, or
insufficiently labeled, affecting model performance. In this study, we address this gap
by collecting a large-scale, annotated dataset of Roman Urdu Facebook comments and
utilizing it for training and evaluation of various ML and DL models [16].
Before applying ML and DL models, raw data must undergo preprocessing, including
tokenization, stopword removal, stemming, lemmatization, and text vectorization [17]).
Since Roman Urdu text contains spelling variations and non-standard expressions, word
embeddings such as FastText and Word2Vec are used to improve feature representation [18].
Proper data preprocessing is essential for improving classification accuracy and reducing
noise in textual data.
Transfer learning, where models pre-trained on large corpora are fine-tuned for specific
tasks, has been widely used in hate speech detection [19]. Although transformer-based
Algorithms 2025, 18, 331 3 of 18
architectures like BERT have shown state-of-the-art results, this study focuses on CNNs,
LSTM models, and GRUs due to their computational efficiency and interpretability [20].
Future research could explore the integration of transformer-based models with existing
approaches for enhanced hate speech detection in Roman Urdu.
Given the challenges described above, we pose the the following research questions:
• RQ1: Can the machine learning and deep learning models successfully identify hate
speech in Roman Urdu language despite spelling variations and code-mixing?
• RQ2: Which feature representation technique and classifier (i.e., ML/DL or both) com-
bination can achieve the highest performance for Roman Urdu hate speech detection?
• RQ3: How does the use of deep contextual embeddings (FastText and Word2Vec)
influence the classification accuracy on different architectures?
These inquiries frame our conjecture that hybrid systems that exploit both the in-
terpretability of ML and contextual learning capabilities of DL are going to dominate
dedicated methodologies, especially in low-resource and linguistically informal settings.
This research contributes to the field of Roman Urdu hate speech detection by
• Developing a large-scale annotated dataset from Facebook comments;
• Comparing six ML models (LR, SVM, RF, NB, KNN, GBM) and four DL models (CNN,
RNN, LSTM, GRU);
• Demonstrating that CNNs and LSTM models outperform other models, achieving
95.1% and 96.2% accuracy, respectively;
• Providing insights into preprocessing techniques and feature selection for non-
standardized languages;
• Discussing ethical considerations and challenges in hate speech detection.
The rest of the paper is organized as follows: Section 2 includes a literature review
related to existing hate speech detection techniques and their challenges in Roman Urdu.
Section 3 describes the methodology, such as dataset collection, preprocessing methods,
and model implementation. Experimental results, analysis of model performance, and error
evaluation are provided in Section 4. Finally, Section 5 concludes the paper and provides
some potential future research directions.
2. Literature Review
The rising incidence of hate speech on social media has attracted considerable research
interest in the area of Natural Language Processing (NLP) and artificial intelligence (AI).
A number of these hate speech detection and mitigation methods using machine learning
(ML) and deep learning (DL) models have been proposed [21]. Detecting hate speech has
been widely studied in high-resource languages such as English but remains a challenging
area for low-resource languages such as Roman Urdu due to the scarcity of both large
annotated datasets and language processing tools [22].
Roman Urdu presents several linguistic challenges for hate speech detection. Unlike
standardized languages, Roman Urdu lacks a fixed grammatical structure and standardized
spellings, making it difficult for traditional NLP techniques to effectively process it [23].
Moreover, code-mixing between Roman Urdu and English further complicates detection
efforts, as many hate speech expressions involve bilingual mixing. The limited availability
of annotated datasets for Roman Urdu also restricts the application of advanced machine
learning models in this domain [24].
There are some studies on ML-based approaches for hate speech detection for various
languages since LLMs are still not being trained on any languages other than English and
Roman Urdu. Commonly used ML models are Random Forest (RF), Decision Trees (DTs),
Naïve Bayes (NB), Support Vector Machines (SVMs), and Logistic Regression (LR) [25].
Algorithms 2025, 18, 331 4 of 18
Most of these models utilize feature extraction methods like TF-IDF (Term Frequency-
Inverse Document Frequency), n-grams, and Bag-of-Words (BoW) for data representation
related to text [26]. But there are limitations on such methods, specifically in dealing with
context-dependent hate speech expressions, as well as implicit invective.
With the advancement of deep learning (DL), several models have been developed to
improve hate speech classification. Recurrent Gated Recurrent Units (GRUs), Long Short-
Term Memory (LSTM), Convolutional Neural Networks (CNNs), and Neural Networks
(RNNs) have been widely used to classify hate speech with higher accuracy [27]. CNN is
particularly effective in detecting word patterns and short phrases associated with hate
speech, while LSTM captures sequential dependencies and contextual information [28].
These deep learning models outperform traditional ML approaches due to their ability to
automatically learn features without requiring extensive manual preprocessing.
Recent studies have suggested hybrid approaches that integrate ML and DL models for
improved hate speech detection [29]. Hybrid models combine feature-based ML classifiers
with deep learning architectures to leverage the strengths of both techniques. For example,
a model might use TF-IDF for feature extraction and an LSTM network for classification,
thereby improving overall performance in detecting complex hate speech expressions [30].
Such hybrid techniques have shown promising results in multilingual hate speech detection.
One of the most significant advancements in hate speech detection is the use of word
embeddings, which capture semantic relationships between words. Pre-trained word
embedding models such as Word2Vec, GloVe, and FastText have been employed to enhance
hate speech classification [19]. FastText in particular is effective for Roman Urdu because
it can handle out-of-vocabulary (OOV) words by breaking them into character-level n-
grams [9]. These embeddings improve feature representation and help in detecting implicit
and context-dependent hate speech.
One of the primary challenges in hate speech detection for Roman Urdu is the scarcity
of publicly available datasets [23]. While large-scale datasets exist for English-language hate
speech detection, there are very few labeled datasets for Roman Urdu. Some researchers
have attempted to crowdsource annotations for Roman Urdu datasets, but bias in labeling
and subjective interpretation of hate speech remain major concerns [27]. Developing stan-
dardized and diverse datasets is crucial for improving the effectiveness of machine learning
models. To detect hate speech, numerous studies have investigated the performance of
ML and DL models in different scenarios. Findings indicate that deep learning models,
particularly LSTM and CNN, consistently outperform ML classifiers [31]. For instance, a
recent study showed that LSTM achieved 96% accuracy, outperforming SVM and Random
Forest models [30]. However, DL models require large amounts of labeled data and signif-
icant computational resources, which limits their widespread adoption for low-resource
languages such as Roman Urdu.
Hate speech detection models must balance accuracy with ethical considerations [32].
Automated systems are prone to biases, particularly when trained on imbalanced datasets
or biased labeling practices [30]. Some studies have highlighted the risk of over-censorship,
where AI models incorrectly classify non-hate speech as offensive. Ensuring fairness,
transparency, and unbiased model training is critical in the development of effective hate
speech detection systems.
Hate speech detection has garnered much attention in the past few years, thanks
to the spread of offensive content on social media. In the early days, the majority of
the approaches were based on classic machine learning classifiers (such as SVM, Naïve
Bayes, and Logistic Regression) with surface-level features like TF-IDF or Bag-of-Words.
Since the introduction of deep learning, models such as CNN and LSTM have shown
better performance because their structure can capture the contextual semantics. More
Algorithms 2025, 18, 331 5 of 18
recently, another group of models, those designed in the transformer paradigm, have
exceeded prior art by using contextual embeddings and a large scale of pretraining, e.g.,
BERT, RoBERTa, XLM-R. There are several studies [33] that have investigated multilingual
and cross-lingual hate speech detection, most of which, however, are restricted to high-
resource languages, such as English, Arabic, or Spanish. Although these models are highly
generalizable, they do not cope well with the informal, noisy style of user-generated content
in low-resource languages.
A limited number of studies have been conducted for Roman Urdu because it does not
have a standard orthography, has linguistic inconsistencies, and has hardly any annotated
datasets available. In [34], a BiLSTM, BigRU model for hate speech detection in Roman
Urdu with FastText embeddings, was based on a small and non-cross-domain dataset. The
authros employed BERT+CNN-gram and handcrafted features but did not investigate large
language models or explainable AI techniques. Also, most of the existing work does not
consider the intricacy of code mix and the cultural connotations of Roman Urdu phrases.
We aim to address this gap by employing QLoRA-optimized LLMs (e.g., LLaMA3, Mistral)
on translated Roman Urdu data.
The focus of future research should be on developing more robust datasets, improv-
ing model interpretability, and integrating explainable AI (XAI) techniques for better
decision-making [4]. Additionally, multilingual and cross-lingual hate speech detection
approaches can help address the challenges of low-resource languages. Exploring the
role of transformer-based architectures such as BERT and RoBERTa could further enhance
performance, although these models require substantial computational power [35].
Despite significant advancements in hate speech detection, research on Roman Urdu
remains underdeveloped due to linguistic variability, limited datasets, and the lack of
standard NLP tools. Existing studies have primarily focused on either traditional ML
classifiers or deep learning models in isolation, often overlooking the potential of hybrid
approaches that combine both techniques for enhanced performance.
Moreover, previous works have struggled with imbalanced datasets and contextual
ambiguity, limiting their real-world applicability. To address these gaps, this study collects
and annotates a large-scale dataset of Roman Urdu hate speech from Facebook comments, a
resource currently lacking in the field. We employ six ML models (Logistic Regression, SVM,
Naïve Bayes, Random Forest, KNN, Gradient Boosting) and four DL models (CNN, RNN,
LSTM, GRU) to evaluate their effectiveness in hate speech detection. Our experimental
results reveal that CNN and LSTM outperform all other models, achieving 95.1% and
96.2% accuracy, respectively. Furthermore, our work introduces improved preprocessing
techniques, including phonetic normalization and optimized word embeddings (FastText
and Word2Vec) to better handle Roman Urdu’s spelling variations and code-mixing issues.
By integrating state-of-the-art deep learning methods with ML feature engineering, we
provide a more robust, scalable, and linguistically informed approach to Roman Urdu hate
speech detection, setting the foundation for future research in low-resource languages.
3. Methodology
In this section, we discuss the proposed methodology followed for the detection
of hate speech in Roman Urdu, covering areas such as data gathering, preprocessing,
feature extraction, various models, hybrid approaches, and performance evaluation metrics.
Figure 1 presents an overview of the complete pipeline of the workflow, flowing from the
raw comment to classification through ML and DL models.
Algorithms 2025, 18, 331 6 of 18
comments. The mean number of words in a comment is around 18.7 words, with a standard
deviation of 6.5 words. The one with the fewest words has 3 words, and the one with the
most words has 47 words. For class distribution, 22,314 comments were annotated as “Hate
Speech”, and 23,712 of the comments were labeled as “Not Hate Speech”, which indicated
that it was balanced. These statistics show the variety of words used and the variation in
length of comments in the dataset, which increases the difficulty of classification.
Despite this, SVM and Naive Bayes saw a decline in performance, as these models
rely on feature independence assumptions, which do not align well with dense word
embeddings like Word2Vec. On the other hand, KNN improved slightly compared to its
performance with TF-IDF, as Word2Vec embeddings allow for more meaningful similarity
calculations between text instances. The results suggest that ML models, particularly
GBM and RF, benefit from richer contextual information, making Word2Vec an effective
embedding choice for hate speech classification.
The relatively lower performance of CNN can be attributed to the lack of contextual
information in TF-IDF representations, which limits its ability to capture relationships
between words. Gated Recurrent Units (GRUs) also struggled, likely due to the sparsity
of TF-IDF embeddings, which do not provide the continuous flow of information needed
for recurrent architectures. This table confirms that TF-IDF is not ideal for deep learning
models and should be used primarily for ML classifiers.
LSTM outperformed other models, achieving a notably higher recall score, suggest-
ing that it was able to capture long-range dependencies in Roman Urdu text. GRU also
performed well but slightly lagged behind LSTM, as its simpler gating mechanism some-
times loses context in longer sentences. This table reinforces that DL models benefit from
embeddings like Word2Vec, as they offer better generalization and capture linguistic nu-
ances effectively.
Algorithms 2025, 18, 331 12 of 18
the best results in most of the models, as it has great discriminatory capabilities to treat
non-standard spelling and morphological variations properly.
The best result was obtained by combining FastText embeddings and LSTM with a
96.2% F-score. Results also indicated the superiority of CNN with FastText as the second
best performing model (F-score 95.1%). In contrast, the old-fashioned TF-IDF representa-
tions, which may work well for some ML algorithms (e.g., SVM), were no longer satisfactory
for DL models, which underscores the necessity for context-rich embeddings in deep learn-
ing applications. So the best strategy for hate speech detection in Roman Urdu is to make
use of deep learning techniques, especially LSTM with strong embeddings such as FastText.
Incorporating deep contextual embeddings significantly improved the classification
performance in both traditional machine learning and deep learning models. Performance
was greatly enhanced by Word2Vec embeddings, which were able to capture semantic
similarity, especially for ensemble techniques including the Random Forest and Gradient
Boosting Machines. These embeddings enabled machine learning models to outperform
benchmarks based on conventional TF-IDF values, as they allowed the models to make
use of the semantic and syntactic distance between words, but still, they were limited to a
dense embedding structure that does not suit many of the conventional classifiers.
Author Contributions: Conceptualization, A.N., A.Q., N.H., G.S., M.Z. and A.G.; methodology,
A.N., A.Q., N.H., G.S. and A.G.; validation, A.N., A.Q., N.H., M.Z. and F.A.; formal analysis, A.N.,
A.Q., N.H., M.Z. and F.A.; data curation, A.N., A.Q., N.H., M.Z. and F.A.; writing, A.N. and A.Q.;
funding acquisition, G.S. and A.G. All authors have read and agreed to the published version of
the manuscript.
Funding: This work was partially supported by the Mexican Government through the grant A1-
S-47854 of CONACYT, Mexico, and the grants 20254236, 20253468, and 20254341 provided by the
Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors
thank CONACYT for the computing resources brought to them through the Plataforma de Apren-
dizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE,
Mexico, and acknowledge the support of Microsoft through the Microsoft Latin America PhD Award.
References
1. Hashmi, E.; Yayilgan, S.Y.; Hameed, I.A.; Yamin, M.M. Enhancing multilingual hate speech detection: From language-specific
insights to cross-linguistic integration. IEEE Trans. Comput. Intell. AI Games 2024, 12, 121507–121537. [CrossRef]
2. Ashiq, W.; Kanwal, S.; Rafique, A.; Waqas, M. Roman Urdu hate speech detection using hybrid machine learning models and
hyperparameter optimization. Sci. Rep. 2024, 14, 28590. [CrossRef] [PubMed]
3. Sujatha, R.; Chatterjee, J.M.; Pathy, B.; Hu, Y.C. Automatic emotion recognition using deep neural networks. Multimed. Tools Appl.
2025. [CrossRef]
4. Khan, M.M.; Shahzad, K.; Malik, M.K. Hate speech detection in Roman Urdu. ACM Trans. Asian Low-Resour. Lang. Inf. Process.
2021, 20, 1–19. [CrossRef]
5. Aziz, S.; Sarfraz, M.S.; Usman, M.; Aftab, M.U.; Rauf, H.T. Geo-spatial mapping of hate speech prediction in Roman Urdu.
Mathematics 2023, 11, 969. [CrossRef]
6. Al-Hassan, A.; Al-Dossari, H. Detection of hate speech in social networks: A survey on multilingual corpus. In Proceedings of the
6th International Conference on Computer Science and Information Technology, Dubai, United Arab Emirates, 23–24 February
2019; ACM: New York, NY, USA, 2019; Volume 10, pp. 10–5121.
7. Madukwe, J.K. The Detection of Online Textual Hate Speech. Ph.D. Thesis, Open Access Te Herenga Waka—Victoria University
of Wellington, Wellington, New Zealand, 2025.
8. Hussain, N.; Qasim, A.; Mehak, G.; Kolesnikova, O.; Gelbukh, A.; Sidorov, G. Hybrid machine learning and Deep Learning
Approaches for Insult Detection in Roman Urdu Text. AI 2025, 6, 33. [CrossRef]
Algorithms 2025, 18, 331 17 of 18
9. Sharma, D.; Nath, T.; Gupta, V.; Singh, V.K. Hate Speech Detection Research in South Asian Languages: A Survey of Tasks,
Datasets and Methods. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2025, 24, 1–44. [CrossRef]
10. Ayo, F.E.; Folorunso, O.; Ibharalu, F.T.; Osinuga, I.A.; Abayomi-Alli, A. A probabilistic clustering model for hate speech
classification in Twitter. Expert Syst. Appl. 2021, 173, 114762. [CrossRef]
11. Daouadi, K.E.; Boualleg, Y.; Guehairia, O. Comparing Pre-Trained Language Model for Arabic Hate Speech Detection. Computación
y Sistemas 2024, 28, 681–693. [CrossRef]
12. Mehmood, F.; Shahzadi, R.; Ghafoor, H.; Asim, M.N.; Ghani, M.U.; Mahmood, W.; Dengel, A. ENML: Multi-label ensemble
learning for Urdu text classification. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 1–31. [CrossRef]
13. Hussain, N.; Qasim, A.; Akhtar, Z.U.D.; Qasim, A.; Mehak, G.; del Socorro Espindola Ulibarri, L.; Gelbukh, A. Stock Market
Performance Analytics Using XGBoost. In Advances in Computational Intelligence, Proceedings of the Mexican International Conference
on Artificial Intelligence, Yucatán, Mexico, 13–18 November 2023; Springer Nature: Cham, Switzerland, 2023; pp. 3–16.
14. Arshad, M.U.; Shahzad, W. Understanding hate speech: The HateInsights dataset and model interpretability. PeerJ Comput. Sci.
2024, 10, e2372. [CrossRef] [PubMed]
15. Sharif, W.; Abdullah, S.; Iftikhar, S.; Al-Madani, D.; Mumtaz, S. Enhancing Hate Speech Detection in the Digital Age: A Novel
Model Fusion Approach Leveraging a Comprehensive Dataset. IEEE Access 2024, 12, 27225–27236. [CrossRef]
16. Dhanya, L.K.; Balakrishnan, K. Hate speech detection in Asian languages: A survey. In Proceedings of the 2021 International
Conference on Communication, Control and Information Sciences (ICCISc), Idukki, India, 16–18 June 2021; IEEE: New York, NY,
USA, 2021; Volume 1, pp. 1–5.
17. Aklouche, B.; Bazine, Y.; Ghalia-Bououchma, Z. Offensive Language and Hate Speech Detection Using Transformers and
Ensemble Learning Approaches. Comput. Sist. 2024, 28, 1031–1039. [CrossRef]
18. Cruz, R.M.; de Sousa, W.V.; Cavalcanti, G.D. Selecting and combining complementary feature representations and classifiers for
hate speech detection. Online Soc. Netw. Media 2022, 28, 100194. [CrossRef]
19. Alatawi, H.S.; Alhothali, A.M.; Moria, K.M. Detecting white supremacist hate speech using domain-specific word embedding
with deep learning and BERT. IEEE Access 2021, 9, 106363–106374. [CrossRef]
20. Qureshi, K.A.; Sabih, M. Un-compromised credibility: Social media based multi-class hate speech classification for text. IEEE
Access 2021, 9, 109465–109477. [CrossRef]
21. Abdrakhmanov, R.; Kenesbayev, S.M.; Berkimbayev, K.; Toikenov, G.; Abdrashova, E.; Alchinbayeva, O.; Ydyrys, A. Offensive
Language Detection on Social Media using machine learning. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 575. [CrossRef]
22. Jha, V.K.; Hrudya, P.; Vinu, P.N.; Vijayan, V.; Prabaharan, P. DHOT-repository and classification of offensive tweets in the Hindi
language. Procedia Comput. Sci. 2020, 171, 2324–2333. [CrossRef]
23. Nasir, S.; Seerat, A.; Wasim, M. Hate speech detection in Roman Urdu using machine learning techniques. In Proceedings of the
2024 5th International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 19–20 February 2024;
IEEE: New York, NY, USA, 2024; pp. 1–7.
24. Arshad, M.U.; Ali, R.; Beg, M.O.; Shahzad, W. UHated: Hate speech detection in Urdu language using transfer learning. Lang.
Resour. Eval. 2023, 57, 713–732. [CrossRef]
25. Hussain, N.; Qasim, A.; Mehak, G.; Kolesnikova, O.; Gelbukh, A.; Sidorov, G. ORUD-Detect: A Comprehensive Approach
to Offensive Language Detection in Roman Urdu Using Hybrid machine learning–Deep Learning Models with Embedding
Techniques. Information 2025, 16, 139. [CrossRef]
26. Jadon, P.; Bhatia, D.; Mishra, D.K. Social Media Text Classification for Hate Speech Detection Using Different Feature Selection
Techniques. In Proceedings of the 2024 IEEE 4th Int. Conf. on ICT in Business Industry & Government (ICTBIG), Indore, India,
13–14 December 2024; IEEE: New York, NY, USA, 2024; pp. 1–8.
27. Bilal, M.; Khan, A.; Jan, S.; Musa, S. Context-aware deep learning model for detection of Roman Urdu hate speech on social media
platform. IEEE Access 2022, 10, 121133–121151. [CrossRef]
28. Riyadi, S.; Andriyani, A.D.; Sulaiman, S.N. Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on
Imbalanced Dataset. IEEE Access 2024, 12, 159660–159668. [CrossRef]
29. Toktarova, A.; Syrlybay, D.; Myrzakhmetova, B.; Anuarbekova, G.; Rakhimbayeva, G.; Zhylanbaeva, B.; Kerimbekov, M. Hate
speech detection in social networks using machine learning and deep learning methods. Int. J. Adv. Comput. Sci. Appl. 2023, 14,
396–406. [CrossRef]
30. Ahmed, Z.; Vidgen, B.; Hale, S.A. Tackling racial bias in automated online hate detection: Towards fair and accurate detection of
hateful users with geometric deep learning. EPJ Data Sci. 2022, 11, 8. [CrossRef]
31. Abdellaoui, I.; Ibrahimi, A.; El Bouni, M.A.; Mourhir, A.; Driouech, S.; Aghzal, M. Investigating Offensive Language Detection in
a Low-Resource Setting with a Robustness Perspective. Big Data Cogn. Comput. 2024, 8, 170. [CrossRef]
32. Meque, A.G.M.; Hussain, N.; Sidorov, G.; Gelbukh, A. Machine Learning-Based Guilt Detection in Text. Sci. Rep. 2023, 13, 11441.
[CrossRef] [PubMed]
Algorithms 2025, 18, 331 18 of 18
33. Aluru, S.S.; Mathew, B.; Saha, P.; Mukherjee, A. A deep dive into multilingual hate speech classification. In Machine Learning and
Knowledge Discovery in Databases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2020, Ghent, Belgium,
September 14–18, 2020, Proceedings, Part V; Springer International Publishing: Cham, Switzerland, 2021; pp. 423–439.
34. Rizwan, H.; Shakeel, M.H.; Karim, A. Hate-speech and offensive language detection in Roman Urdu. In Proceedings of the 2020
Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Association for
Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 2512–2522.
35. Alsafari, S.; Sadaoui, S. Semi-supervised self-training of hate and offensive speech from social media. Appl. Artif. Intell. 2021, 35,
1621–1645. [CrossRef]
36. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013,
arXiv:1301.3781.
37. Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [CrossRef]
38. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [CrossRef]
39. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
40. Lewis, D.D. Naive (Bayes) at forty: The independence assumption in information retrieval. In Machine Learning: ECML-98,
Proceedings of the European Conference on Machine Learning, Chemnitz, Germany, 21–23 April 1998; Springer: Berlin/Heidelberg,
Germany, 1998; pp. 4–15.
41. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [CrossRef]
42. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 1189–1232. [CrossRef]
43. Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. arXiv 2015, arXiv:1509.01626.
44. Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [CrossRef]
45. Qasim, A.; Mehak, G.; Hussain, N.; Gelbukh, A.; Sidorov, G. Detection of Depression Severity in Social Media Text Using
Transformer-Based Models. Information 2025, 16, 114. [CrossRef]
46. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations
using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078.
47. Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020,
arXiv:2010.16061.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.