Algorithms 18 00396 v2
Algorithms 18 00396 v2
Article
1 Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Av. Juan de Dios Batiz,
s/n, Mexico City 07320, Mexico; [email protected] (M.Z.); [email protected] (N.H.);
[email protected] (A.Q.); [email protected] (G.M.); [email protected] (A.G.)
2 Department of Computer Science, University of Central Punjab, Punjab 54810, Pakistan;
[email protected]
† These authors contributed equally to this work.
Abstract
The detection of abusive language in Roman Urdu is important for secure digital interac-
tion. This work investigates machine learning (ML), deep learning (DL), and transformer-
based methods for detecting offensive language in Roman Urdu comments collected from
YouTube news channels. Extracted features use TF-IDF and Count Vectorizer for unigrams,
bigrams, and trigrams. Of all the ML models—Random Forest (RF), Logistic Regression
(LR), Support Vector Machine (SVM), and Naïve Bayes (NB)—the best performance was
achieved by the same SVM. DL models involved evaluating Bi-LSTM and CNN models,
where the CNN model outperformed the others. Moreover, transformer variants such
as LLaMA 2 and ModernBERT (MBERT) were instantiated and fine-tuned with LoRA
(Low-Rank Adaptation) for better efficiency. LoRA has been tuned for large language
models (LLMs), a family of advanced machine learning frameworks, based on the prin-
Academic Editors: James Jianqiao Yu,
Affan Yasin, Javed Ali Khan and ciple of making the process efficient with extremely low computational cost with better
Lijie Wen enhancement. According to the experimental results, LLaMA 2 with LoRA attained the
Received: 3 June 2025 highest F1-score of 96.58%, greatly exceeding the performance of other approaches. To
Revised: 17 June 2025 elaborate, LoRA-optimized transformers perform well in capturing detailed subtleties of
Accepted: 24 June 2025 linguistic nuances, lending themselves well to Roman Urdu offensive language detection.
Published: 28 June 2025 The study compares the performance of conventional and contemporary NLP methods,
Citation: Zain, M.; Hussain, N.; highlighting the relevance of effective fine-tuning methods. Our findings pave the way
Qasim, A.; Mehak, G.; Ahmad, F.; for scalable and accurate automated moderation systems for online platforms supporting
Sidorov, G.; Gelbukh, A. RU-OLD: A
multiple languages.
Comprehensive Analysis of Offensive
Language Detection in Roman Urdu
Keywords: deep learning; machine learning; support vector machine; large language model
Using Hybrid Machine Learning,
Deep Learning, and Transformer
Models. Algorithms 2025, 18, 396.
https://doi.org/10.3390/a18070396
1. Introduction
Copyright: © 2025 by the author.
Licensee MDPI, Basel, Switzerland. Despite the myriad of platforms for public expression expansion, the internet is
This article is an open access article seemingly rife with prejudice, and freedom of speech often serves as a veil for insidiousness
distributed under the terms and on social media. The increase in online toxicity has led to the need for stronger detection
conditions of the Creative Commons
methods to create a safer (and less toxic) internet. The best approaches to offensive language
Attribution (CC BY) license
(https://creativecommons.org/
detection are heavily based on machine learning (ML), deep learning (DL), and transformer-
licenses/by/4.0/). based models. Fine-tuning pre-trained language models, such as BERT, has been proven
to effectively improve the ability to detect abusive content [1]. However, ensuring that
these models are explainable and reliable is still a challenge. Some recent approaches have
suggested incorporating logical rule-based methods into neural frameworks to provide
explainability [2], and others have used data augmentations based on believed symmetries
of the data to improve the models overall generalization and explainability [3].
In marginalized communities, the effects of offensive speech can be profound, es-
pecially on adolescents with autism, where AI-enabled virtual companions have been
developed to help users become aware of and combat cyberbullying [4]. Moreover, cross-
lingual learning methods have also been investigated to enhance hate speech detection in
multiple languages, revealing the impact of transfer learning techniques on multilingual
toxicity [5]. For instance, when dealing with languages previously mentioned (e.g., Span-
ish), linguistic features, as well as transformer-based architectures, have been studied [6],
and the combination of multiple features has been identified as a crucial aspect that is
looked into in order to improve the models. Overall, techniques such as zero-shot and
few-shot learning have been applied to increase the adaptability of these models in a
multilingual setting, alleviating the extensive requirements for labeled data [7].
More recent literature has methodologies such as ring hybrid methodologies, e.g., us-
ing added bidirectional encoder–decoder architectures, which have yielded promising
results in this domain of interest [8]. Following suit, optimization-driven approaches,
including hybrid CNN-based classifiers, have also shown enhancement in classification
performance for abusive comments [9]. The efficacy of transformer-based models for
trait-based classification tasks has been further solidified through extensive analysis of
multimodal classification for cyberbullying detection [10]. To create synthetic but real-
istic training samples, data augmentation strategies and, most notably, contrastive self-
supervised learning have been suggested to improve cyberbullying detection [11]. Transfer
learning methods were additionally validated and beneficial to datasets pertaining to
Twitter, increasing the classification of hate speech in social media [12].
These advancements aside, the detection of offensive language in Roman Urdu is
a relatively unexplored area. Roman Urdu, a widely adopted script for Urdu on digital
channels, poses a challenge with its varying standards of writing, code-mixing, and blurred
grammar. This study tackles these challenges by applying and evaluating traditional ML
models (Random Forest, Support Vector Machines, Naïve Bayes, and Logistic Regression),
deep learning models (CNN and Bi-LSTM), and state-of-the-art transformer-based models
(LLaMA 2 and ModernBERT). Moreover, we utilize Low-Rank Adaptation (LoRA) for
the efficient tuning of transformer models, ensuring the best performance with minimal
computational resources. In our comparative analysis, we compare how well these mod-
els perform in identifying offensive language in Roman Urdu social media comments,
providing insights into their portability and real-world usability.
in Roman Urdu, thus presenting an important advancement within the area of processing
low-resource languages. Our contributions can be summarized as.
2. Literature Review
The identification of profane and hateful messages has become a key research topic in
the domain of natural language processing (NLP). Multiple approaches and methods (tra-
ditional machine learning, deep learning, transformer models, etc.) have been examined to
solve such challenges in different languages. The introduction of transformer architectures
has shown tremendous progress in detection accuracy, even for resource-limited languages
that suffer from a lack of computational resources and labeled datasets.
In the early works of Arabic hate speech detection, BERT-based models were utilized,
which achieved good results and illustrated how fine-tuned smaller models such as ABMM
(Arabic BERT-Mini Model) can increase detection efficiency and decrease computational
costs [13]. A newer transformer model was created using an improved RoBERTa-based
model coupled with GloVe embeddings, within which cyberbullying detection results
significantly improved [14]. Building upon this, researchers have examined how the
inclusion of emojis and sentiment analysis in specific Arabic Twitter datasets can enhance
classification performance [15].
Algorithms 2025, 18, 396 4 of 19
Hybrid models have also been studied in multilingual hate speech detection. Address-
ing the abovementioned issues so as to achieve better detection in Turkish social media
content, researchers proposed SO-HATRED, a hybrid approach combining ensemble deep
learning models on BERT and clustered-graph networks [16]. A similar study developed
HateDetector, a cross-lingual approach to enlist hate speech analysis in multilingual online
social networks using deep feature extraction methods [17].
Although research related to Urdu hate speech detection is still limited, progress has
been made. A transfer learning model, UHATED, also utilizes various pre-trained models
to effectively classify hate speech in Urdu-based datasets, showcasing the adaptability of
pre-trained models, especially for low-resource languages [18]. Regarding deep feature
extraction development for GCNs in social media troll detection, in direct parallel, [19]
proposed to gain potential performance improvement in upcoming MTL models by utiliz-
ing GCNs to successfully extract deep features from social media users marked as trolls.
Another hybrid method that combines semantic compression and Support Vector Machines
(SVMs) focuses on filtering troll threat sentences, highlighting the role of feature selection
in enhancing detection capabilities [20].
The transformer-based models have also helped improve the performance of hate
speech classification in the case of Roman Urdu. [21] utilized transformer-based architec-
tures fine-tuned for cybersecurity tasks and a notable enhancement in the classification
accuracy for offensive language in the Roman Urdu datasets. A third study concentrated
on cross-lingual learning methods, with implications for leveraging multilingual models to
detect hate speech within linguistic communities [22].
Beyond just model performance, previous research has explored the broader psy-
chological and societal impacts of online hate speech. Meta-analyses on cyber victim-
ization of adolescents indicate a strong relationship between online violence and inter-
nalizing/externalizing behavioral problems [23]. More recently, extensive surveys on
methodologies for hate speech detection have highlighted the progress of automatic tech-
niques to classify text as hate speech, acknowledging the significance of dataset quality,
feature engineering, and model interpretability [24].
Also, preprocessing techniques are one of the important factors in offensive language
detection. It has been proven in previous works that preprocessing of Arabic text, including
practical measures like removing diacritics and script normalization, enhances model
performance in hate speech and offensive content classification [25]. Models like G-BERT,
which are transformer-based and specialized for classifying Bengali text, are more efficient
in identifying offensive speech on platforms like Facebook and Twitter [26]. Hierarchical
attention mechanisms also show a significant improvement when combined with BiLSTM
and deep CNNs in detecting hate speech [27].
To contextualize our contributions within the current research landscape, we present
a summary of recent studies on offensive language detection in Table 1. This compara-
tive overview highlights key advancements in language coverage, feature engineering
techniques, model architectures, and targeted platforms. Notably, these works explore
low-resource and multilingual settings using a wide range of traditional, deep learning,
and transformer-based approaches. The table underscores the growing trend of leveraging
hybrid models, ensemble frameworks, and task-specific datasets to improve classification
performance across languages like Urdu, Roman Urdu, Arabic, and other South Asian
languages. Our work builds on these developments by introducing a comprehensive bench-
mark for Roman Urdu offensive language detection using LoRA-optimized transformer
models, offering both high accuracy and computational efficiency.
Algorithms 2025, 18, 396 5 of 19
Table 1. Recent studies on offensive language detection in low-resource and multilingual contexts.
3. Methodology
In this work, we demonstrate an extensive set of methods for offensive language
detection in Roman Urdu using ML, DL, and transformer-based models. In this work,
we will discuss our multi-step approach for dataset collection, data preprocessing, model
training, hyperparameter tuning, and performance evaluation. All components in each step
have been tuned to provide good performance in classifying offensive text while applying
the best NLP approaches. A graphical representation of our method is shown in Figure 1.
Algorithms 2025, 18, 396 6 of 19
where Q, K, V represent the query, key, and value matrices, respectively, and dk is the
dimension of the key vectors. This formula enables the model to assign varying levels of
importance to different tokens in a sequence.
For LoRA (Low-Rank Adaptation), instead of updating the full weight matrix W, LoRA
introduces two low-rank matrices A ∈ Rr×d and B ∈ Rd×r , and the update is defined as
W ′ = W + ∆W, ∆W = BA (2)
where r ≪ d, making this update computationally efficient while enabling effective fine-
tuning. The base model weights W remain frozen, and only the low-rank matrices A and B
are trained.
• Precision: Determines how many predicted offensive comments were actually offen-
sive.
• Recall: Measures how well the model identifies offensive comments.
• F1-Score: Provides a balanced measure of precision and recall.
embedded within Urdu text. The reasoning behind this decision is probably that there are
offensive words that do not need to be combined into three words to be identified.
Table 2. Results of ML models with different Count Vectorizer n-grams on the proposed dataset.
4.5. F-Measure Values of Various Word-Level n-Grams Across Different ML and DL Models
The bar chart of F-measure values for different word-level n-grams (uni-, bi-, and tri-
gram alone or their combined uni + bi + tri) based on machine learning (ML) and deep
learning (DL) models employed for the offensive language detection on the dataset of
Urdu language is shown in Figure 2. F-measure (or F1-score) is a performance metric that
combines precision and recall in a way that preserves the model’s performance concerning
correctly identifying offensive language, as well as false positives and false negatives.
Algorithms 2025, 18, 396 12 of 19
Figure 2. F-measure values of various word-level n-grams across different ML and DL models.
ML Models. For general n-gram-based configurations, it can be seen from the overall
F-measure scores that F-measure values are highest on settings used for the combined uni-
+ bi- + tri-gram dataset, with Logistic Regression, SVM, and Random Forest consistently
beating other models. From the ML models, Naïve Bayes places poorly, especially for
the tri-grams.
DL Models. CNN and Bi-LSTM models achieve high F-measure values and are
relatively stable across various n-grams, which suggests these two models learned to
generalize to the complexity of language (pay attention to the difference between the
max-micro and min-micro for each model, which is quite small). It is similar for DL
models where the combined uni- + bi- + tri-gram configuration achieves higher accuracy.
The contrast indicates that although the n-gram combined setting enhances the performance
of all models to some degree, deep learning (CNN and Bi-LSTM models) shows superior
performance against all traditional ML models with robust performances against specific
combined settings.
Table 3 indicates that the performance of the CNN model is superior to Bi-LSTM in
all indices. We are testing the data on a CNN which has a precision of 97.25%, capable of
Algorithms 2025, 18, 396 13 of 19
accurately classifying between offensive and non-offensive labels with some false positives.
Its recall is 94.67%, indicating that it is neglecting the majority of the legitimate attack
instances in the dataset. The F1-score at 95.43% thus shows good precision and recall leads
to information that in turn propagates CNN accuracy for offensive language detection.
Further, the CNN achieves an accuracy of 95.19%, showing the ability to generalize through
all samples of the dataset.
The Bi-LSTM model also yields similar performance, with precision of 92.38%, recall of
91.74%, and F1-score of 92.19%. Nonetheless, these metrics are relatively lower compared
to the CNN metrics but still show good performance overall. Comfortably trailing on
the accuracy scale behind the CNN, the Bi-LSTM scores a reasonable 92.31%. This gap in
performance can be explained by the more potent ability of the CNN to find spatial features
in short text data, in particular, YouTube comments.
Conversely, ModernBERT (M-BERT) produces good performance as well, but still, its
effectiveness is nt comparable with LLaMA 2’s F1-score of 94.10% and accuracy of 94.67%.
While M-BERT works fine for offensive language detection, it does not generalize as well
as LLaMA 2 after fine-tuning it with LoRA. Similarly, the small decrease in recall and
precision values for M-BERT indicates it must have had difficulty with different kinds
of linguistic variations in Roman Urdu, especially when there are cases of code-mixing
and informal spelling. Collectively, these results underline the suitability of fine-tuned
transformer architectures for capturing the nuances of offensive language in Roman Urdu,
which further helps in real-world content moderation tasks.
Error Analysis
To validate the generalization behavior of our models and gain deeper insight into their
performance, we performed error analysis using confusion matrices (Figures 3–5). These
matrices demonstrate the benefits and drawbacks of using ML, DL, and LLM-based classi-
fiers. The SVM and Logistic Regression ML models provide high true-positive and true-
negative counts, along with balanced sensitivity and specificity. Conversely, Naïve Bayes
had more false positives and false negatives, consistent with its lower performance metrics.
Algorithms 2025, 18, 396 14 of 19
Thus, the deep learning models (CNN and Bi-LSTM) showed much better class sep-
aration, with the CNN having the lowest number of misclassifications. But Bi-LSTM
showed slightly’ more false positives, suggesting some over-classification of neutral con-
tent as offensive content. This knowledge is instrumental in interpreting the contextual
misinterpretation behavior of DL models.
The classification was almost symmetric, suggesting that LLaMA 2 was particularly
the best among LLMs with high precision and was least confused between classes. Mod-
ernBERT, achieving slightly lower accuracy but high false-negative rates and low overall
error, still performed well. These results validate that, particularly for fine-tuned LLMs
on task-specific datasets, the model is able to capture the nuanced semantics of Roman
Urdu–English code-mixed content.
Algorithms 2025, 18, 396 15 of 19
5. Conclusions
We demonstrated an exhaustive comparison of machine learning (ML), deep learning
(DL), and transformer-based approaches in offensive language detection tasks specific
to Roman Urdu text here. Utilizing traditional ML classifiers (SVM, Naïve Bayes, Logis-
tic Regression, and Random Forest), deep learning architectures (CNN and Bi-LSTM),
and transformer-based models (LLaMA 2 and ModernBERT), we perform a comprehensive
comparison for text classification, ascertaining the most promising approach. We show that
fine-tuned transformer models greatly affect the performance (of offensive language detec-
tion in Roman Urdu), out of which the fine-tuned version of LLaMA 2 with LoRA showed
the best performance with an F1-score of 96.58%, thus making it the optimal solution for
offensive language detection in Roman Urdu. The CNN also caught attention as it excelled
in learning patterns in the text while underperforming compared to LLaMA 2, owing to
its inability to model longer dependencies or capture contextual variations. ModernBERT
also performed competitively, demonstrating the relevance of transformer-based models
for low-resource and transliterated text processing. These results highlight that when
effectively fine-tuned, large language models offer the best-in-class results for offensive
language detection in complex linguistic environments.
6. Future Directions
Research can be conducted in the future to include data from other social media
platforms, rather than limiting this study to YoYouTube, which will help in better gen-
Algorithms 2025, 18, 396 17 of 19
Author Contributions: Conceptualization, M.Z., N.H., A.Q., G.M., G.S. and A.G.; methodology,
M.Z., N.H., A.Q., G.M., G.S. and A.G.; validation, M.Z., N.H., A.Q., G.M. and F.A.; formal analysis,
M.Z., N.H., A.Q., G.M. and F.A.; data curation, M.Z., N.H., A.Q., G.M. and F.A.; writing, M.Z., N.H.
and A.Q.; funding acquisition, G.S. and A.G. All authors have read and agreed to the published
version of the manuscript.
Funding: This work was partially supported by the Mexican Government through the grant
A1-S-47854 of CONACYT, Mexico, and the grants 20254236, 20253468, and 20254341 provided
by the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The
authors thank CONACYT for the computing resources brought to them through the Plataforma de
Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the
INAOE, Mexico, and acknowledge the support of Microsoft through the Microsoft Latin America
PhD Award.
References
1. Caselli, T.; Basile, V.; Mitrović, J.; Granitzer, M. HateBERT: Retraining BERT for abusive language detection in English. arXiv 2020,
arXiv:2010.12472.
2. Clarke, C.; Hall, M.; Mittal, G.; Yu, Y.; Sajeev, S.; Mars, J.; Chen, M. Rule by example: Harnessing logical rules for explainable hate
speech detection. arXiv 2023, arXiv:2307.12935.
3. Ansari, G.; Kaur, P.; Saxena, C. Data augmentation for improving explainability of hate speech detection. Arab. J. Sci. Eng. 2024,
49, 3609–3621. [CrossRef]
4. Ferrer, R.; Ali, K.; Hughes, C. Using AI-based virtual companions to assist adolescents with autism in recognizing and addressing
cyberbullying. Sensors 2024, 24, 3875. [CrossRef]
5. Hussain, N.; Qasim, A.; Mehak, G.; Kolesnikova, O.; Gelbukh, A.; Sidorov, G. ORUD-Detect: A Comprehensive Approach
to Offensive Language Detection in Roman Urdu Using Hybrid Machine Learning–Deep Learning Models with Embedding
Techniques. Information 2025, 16, 139. [CrossRef]
6. García-Díaz, J.A.; Jiménez-Zafra, S.M.; García-Cumbreras, M.A.; Valencia-García, R. Evaluating feature combination strategies for
hate-speech detection in Spanish using linguistic features and transformers. Complex Intell. Syst. 2023, 9, 2893–2914. [CrossRef]
7. García-Díaz, J.A.; Pan, R.; Valencia-García, R. Leveraging zero and few-shot learning for enhanced model generality in hate
speech detection in Spanish and English. Mathematics 2023, 11, 5004. [CrossRef]
8. Hussain, N.; Qasim, A.; Mehak, G.; Kolesnikova, O.; Gelbukh, A.; Sidorov, G. Hybrid Machine Learning and Deep Learning
Approaches for Insult Detection in Roman Urdu Text. AI 2025, 6, 33. [CrossRef]
9. Aarthi, B.; Chelliah, B.J. Hatdo: Hybrid Archimedes Tasmanian Devil Optimization CNN for classifying offensive comments and
non-offensive comments. Neural Comput. Appl. 2023, 35, 18395–18415. [CrossRef]
Algorithms 2025, 18, 396 18 of 19
10. Hussain, N.; Anees, T.; Faheem, M.R.; Shaheen, M.; Manzoor, M.I.; Anum, A. Development of a novel approach to search
resources in IoT. Development 2018, 9, 9. [CrossRef]
11. Al-Harigy, L.M.; Al-Nuaim, H.A.; Moradpoor, N.; Tan, Z. Towards a cyberbullying detection approach: Fine-tuned contrastive
self-supervised learning for data augmentation. Int. J. Data Sci. Anal. 2024, 19, 469–490. [CrossRef]
12. Shaheen, M.; Awan, S.M.; Hussain, N.; Gondal, Z.A. Sentiment analysis on mobile phone reviews using supervised learning
techniques. Int. J. Mod. Educ. Comput. Sci. 2019, 10, 32. [CrossRef]
13. Almaliki, M.; Almars, A.M.; Gad, I.; Atlam, E.-S. ABMM: Arabic BERT-mini model for hate-speech detection on social media.
Electronics 2023, 12, 1048. [CrossRef]
14. Aklouche, B.; Bazine, Y.; Ghalia-Bououchma, Z. Offensive Language and Hate Speech Detection Using Transformers and
Ensemble Learning Approaches. Comput. Sist. 2024, 28, 1031–1039. [CrossRef]
15. Althobaiti, M.J. BERT-based approach to Arabic hate speech and offensive language detection in Twitter: Exploiting emojis and
sentiment analysis. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 972-980. [CrossRef]
16. Altinel, A.B.; Sahin, S.; Gurbuz, M.Z.; Baydogmus, G.K. SO-Hatred: A novel hybrid system for Turkish hate speech detection in
social media with ensemble deep learning improved by BERT and clustered-graph networks. IEEE Access 2024, 12, 86252–86270.
[CrossRef]
17. Qasim, A.; Mehak, G.; Hussain, N.; Gelbukh, A.; Sidorov, G. Detection of Depression Severity in Social Media Text Using
Transformer-Based Models. Information 2025, 16, 114. [CrossRef]
18. Arshad, M.U.; Ali, R.; Beg, M.O.; Shahzad, W. UHateD: Hate speech detection in Urdu language using transfer learning. Lang.
Resour. Eval. 2023, 57, 713–732. [CrossRef]
19. Asif, M.; Al-Razgan, M.; Ali, Y.A.; Yunrong, L. Graph convolution networks for social media trolls detection using deep feature
extraction. J. Cloud Comput. 2024, 13, 33. [CrossRef]
20. Meque, A.G.M.; Hussain, N.; Sidorov, G.; Gelbukh, A. Machine Learning-Based Guilt Detection in Text. Sci. Rep. 2023, 13(1),
11441. [CrossRef]
21. Bilal, M.; Khan, A.; Jan, S.; Musa, S.; Ali, S. Roman Urdu hate speech detection using transformer-based model for cyber security
applications. Sensors 2023, 23, 3909. [CrossRef] [PubMed]
22. Daouadi, K.E.; Boualleg, Y.; Guehairia, O. Comparing Pre-Trained Language Model for Arabic Hate Speech Detection. Comput.
Sist. 2024, 28, 681–693. [CrossRef]
23. Fisher, B.W.; Gardella, J.H.; Teurbe-Tolon, A.R. Peer cybervictimization among adolescents and the associated internalizing and
externalizing problems: A meta-analysis. J. Youth Adolesc. 2016, 45, 1727–1743. [CrossRef] [PubMed]
24. Fortuna, P.; Nunes, S. A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 2018, 51, 1–30.
[CrossRef]
25. Husain, F.; Uzuner, O. Investigating the effect of preprocessing Arabic text on offensive language and hate speech detection.
Trans. Asian Low-Resour. Lang. Inf. Process. 2022, 21, 1–20. [CrossRef]
26. Keya, A.J.; Kabir, M.M.; Shammey, N.J.; Mridha, M.F.; Islam, M.R.; Watanobe, Y. G-BERT: An efficient method for identifying hate
speech in Bengali texts on social media. IEEE Access 2023, 11, 79697–79709. [CrossRef]
27. Mehak, G.; Qasim, A.; Meque, A.G.M.; Hussain, N.; Sidorov, G.; Gelbukh, A. TechExperts (IPN) at GenAI Detection Task 1:
Detecting AI-Generated Text in English and Multilingual Contexts. In Proceedings of the 1st Workshop on GenAI Content
Detection (GenAIDetect), Abu Dhabi, United Arab Emirates, 19 January 2025; pp. 161–165.
28. Din, S.U.; Khusro, S.; Khan, F.A.; Ahmad, M.; Ali, O.; Ghazal, T.M. An automatic approach for the identification of offensive
language in Perso-Arabic Urdu Language: Dataset Creation and Evaluation. IEEE Access 2025, 13, 19755–19769. [CrossRef]
29. Rajput, V.; Sikarwar, S.S. Detection of Abusive Language for YouTube Comments in Urdu and Roman Urdu using CLSTM Model.
Procedia Comput. Sci. 2025, 260, 382–389. [CrossRef]
30. Ullah, K.; Aslam, M.; Khan, M.U.G.; Alamri, F.S.; Khan, A.R. UEF-HOCUrdu: Unified Embeddings Ensemble Framework for
Hate and Offensive Text Classification in Urdu. IEEE Access 2025, 13, 21853–21869. [CrossRef]
31. Alvi, M.; Alvi, M.B.; Fatima, N. A Framework for Sarcasm Detection Incorporating Roman Sindhi and Roman Urdu Scripts in
Multilingual Dataset Analysis. J. Comput. Biomed. Inform. 2025, 8. [CrossRef]
32. Hussain, N.; Qasim, A.; Akhtar, Z.U.D.; Qasim, A.; Mehak, G.; del Socorro Espindola Ulibarri, L.; Gelbukh, A. Stock Market
Performance Analytics Using XGBoost. In Proceedings of the Mexican International Conference on Artificial Intelligence; Springer
Nature: Cham, Switzerland, 2023; pp. 3–16.
33. Saeed, H.H.; Khalil, T.; Kamiran, F. Urdu Toxic Comment Classification with PURUTT Corpus Development. IEEE Access 2025,
13, 21635–21651. [CrossRef]
34. Naseeb, A.; Zain, M.; Hussain, N.; Qasim, A.; Ahmad, F.; Sidorov, G.; Gelbukh, A. Machine Learning- and Deep Learning-Based
Multi-Model System for Hate Speech Detection on Facebook. Algorithms 2025, 18, 331. [CrossRef]
35. Islam, M.; Khan, J.A.; Abaker, M.; Daud, A.; Irshad, A. Unified Large Language Models for Misinformation Detection in
Low-Resource Linguistic Settings. arXiv 2025, arXiv:2506.01587.
Algorithms 2025, 18, 396 19 of 19
36. Sharma, D.; Nath, T.; Gupta, V.; Singh, V.K. Hate Speech Detection Research in South Asian Languages: A Survey of Tasks,
Datasets and Methods. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2025, 24, 1–44. [CrossRef]
37. Alansari, A.; Luqman, H. Multi-task Learning with Active Learning for Arabic Offensive Speech Detection. arXiv 2025,
arXiv:2506.02753.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.