A curated collection of NLP research papers, models, datasets, and tools covering fundamentals, advanced techniques, and real-world applications. 🚀
-
Updated
Oct 22, 2025
A curated collection of NLP research papers, models, datasets, and tools covering fundamentals, advanced techniques, and real-world applications. 🚀
word, analytics
Soe Vinorm: An Effective Text Normalization Toolkit for converting Vietnamese text to its spoken form.
This project builds and visualizes a Vector Space Model (VSM) for Vietnamese text using TF-IDF, Word2Vec, and FastText.
Underthesea - Vietnamese NLP Toolkit
SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking
Flask-based microservice for aspect-based sentiment analysis of Vietnamese learner feedback. Uses a fine-tuned PhoBERT model on the ViLearn-ABSA dataset to classify sentiment across seven educational aspects, powering Nova Learn’s analytics dashboard.
Vietnamese hate speech detector
Fine-tuned BARTpho-syllable for restoring punctuation and capitalization in Vietnamese.
A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation.
[VLSP 2025] ViDRILL is a Vietnamese document retrieval system for VLSP 2025. It combines dense and sparse retrieval, reranking, and optional LLM-based query rewriting and reasoning to support high-accuracy information retrieval and future LLM-enhanced pipelines.
Vietnamese Text-to-Speech API
🔎 Vietnamese Voice Search Engine - Vietnamese news search app with voice recognition and text-to-speech. Built with Streamlit, users speak queries in Vietnamese to find news and hear AI-summarized results. Hands-free news browsing experience.
Natural Language Processing projects implementing various techniques
Rule-based NLP project for Vietnamese sentiment detection. Utilizes a custom lexicon and basic logic to classify text as positive or negative.
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)
Tóm tắt bản tin thể thao chuyển nhượng tiếng việt, dataset được lấy từ ngày 30/12/2023 trở về trước
Tinh chỉnh mô hình ngôn ngữ lớn tiếng Việt cho một số tác vụ xử lý ngôn ngữ tự nhiên.
This project aims to explore and evaluate various fine-tuning techniques for the task of Vietnamese sentiment analysis.
[AAAI2025] Code for ViFactCheck: A New Benchmark Dataset and Methods for Multi-domain News Fact-Checking in Vietnamese
Add a description, image, and links to the vietnamese-nlp topic page so that developers can more easily learn about it.
To associate your repository with the vietnamese-nlp topic, visit your repo's landing page and select "manage topics."