This repository contains a collection of Natural Language Processing (NLP) projects exploring various techniques and applications in the field of computational linguistics and text processing.
A comprehensive implementation for detecting hate speech and offensive content in Vietnamese text data. The project uses state-of-the-art Vietnamese language models (PhoBERT and ViSoBERT) for text classification.
Key Features:
- Text preprocessing and normalization for Vietnamese
- Data augmentation techniques (EDA and LLM-based)
- Fine-tuning of pretrained Vietnamese language models
- Class imbalance handling with Focal Loss