An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

📄 Official Paper: https://link.springer.com/chapter/10.1007/978-981-95-1746-6_18

This repository contains the code and resources for the paper "An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset".

Introduction

This project introduces ViVQA-X, the first Vietnamese dataset for Visual Question Answering with Natural Language Explanations (VQA-NLE). Developed using a novel automated pipeline, our work provides a crucial resource to advance research in multimodal AI and explainability for the Vietnamese language. ViVQA-X features:

32,886 question-answer pairs with detailed explanations
41,817 high-quality natural language explanations
Multi-stage automated pipeline for translation and quality control
Comprehensive evaluation using multiple state-of-the-art models

This project facilitates research in Vietnamese visual question answering and supports the development of explainable AI systems for Vietnamese language understanding.

Dataset

🔗 Access Points

Resource	Description	Link
Dataset	ViVQA-X Dataset on Hugging Face
Model Weights	Pre-trained LSTM-Generative Model
Demo	Interactive Demo Space

📈 Dataset Statistics

QA Pairs: 32,886 pairs across Train/Validation/Test splits
Explanations: 41,817 high-quality explanations
Average Words: 10 words per explanation
Vocabulary Size: 4,232 unique words in explanations
Images: COCO dataset images with Vietnamese annotations

The dataset is organized into JSON files located in the data/final directory, containing questions, answers, and explanations associated with images from the COCO dataset.

Quick Start

# Clone the repository
git clone https://github.com/duongtruongbinh/ViVQA-X.git
cd ViVQA-X

# Install dependencies
pip install -r requirements.txt

# Download the dataset
bash scripts/download_vqax.sh

# Run the complete pipeline
bash scripts/pipeline.sh

Installation

Prerequisites

Python 3.8+
CUDA 11.2+ (for GPU support)
8GB+ RAM recommended

Setup Instructions

Clone the repository

git clone https://github.com/duongtruongbinh/ViVQA-X.git
cd ViVQA-X

Create and activate virtual environment

# Using conda (recommended)
conda create -n vivqa-x python=3.8
conda activate vivqa-x

# Or using venv
python -m venv vivqa-x
source vivqa-x/bin/activate  # On Windows: vivqa-x\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables (for pipeline)

# Copy example environment file
cp .env.example .env

# Edit .env file and add your API keys:
# OPENAI_API_KEY=your_openai_api_key_here
# GEMINI_APIKEYS=your_gemini_api_key_1,your_gemini_api_key_2

Download the original VQA-X dataset
```
bash scripts/download_vqax.sh
```

Download the COCO dataset The ViVQA-X dataset uses images from the COCO 2014 dataset. You need to download the train2014 and val2014 image sets.

# Create directory for COCO data
mkdir -p data/coco

# Download and unzip Train 2014 images (~13GB)
wget http://images.cocodataset.org/zips/train2014.zip -P data/coco/
unzip data/coco/train2014.zip -d data/coco/
rm data/coco/train2014.zip

# Download and unzip Validation 2014 images (~6GB)
wget http://images.cocodataset.org/zips/val2014.zip -P data/coco/
unzip data/coco/val2014.zip -d data/coco/
rm data/coco/val2014.zip

After this step, you should have the following directory structure: data/coco/train2014 and data/coco/val2014.

Usage

Pipeline

Run the complete translation and processing pipeline:

bash scripts/pipeline.sh

This will:

Translate English VQA-X to Vietnamese
Apply quality selection mechanisms
Post-process the results
Generate the final ViVQA-X dataset

Benchmark

We provide comprehensive benchmarks using multiple state-of-the-art models:

Model	Repository
Heuristic Model	Included
LSTM-Generative	Included
NLX-GPT	GitHub
OFA-X	GitHub
ReRe	GitHub

Heuristic Model

A rule-based approach requiring no training:

Configure the model

# src/models/heuristic_model/config/config.yaml
data:
  train_path: "data/final/ViVQA-X_train.json"
  val_path: "data/final/ViVQA-X_val.json"
  test_path: "data/final/ViVQA-X_test.json"
  train_image_dir: "data/coco/train2014"
  val_image_dir: "data/coco/val2014"
  test_image_dir: "data/coco/val2014"

Run evaluation

python src/models/heuristic_model/run_heuristic.py

Baseline Model

LSTM-Generative model with attention mechanism:

Configure the model

# src/models/baseline_model/config/config.yaml
data:
  train_path: "data/final/ViVQA-X_train.json"
  val_path: "data/final/ViVQA-X_val.json"
  test_path: "data/final/ViVQA-X_test.json"
  train_image_dir: 'data/coco/train2014'
  val_image_dir: 'data/coco/val2014'
  test_image_dir: 'data/coco/val2014'

model:
  device: "cuda:0"  # Adjust based on GPU availability
  embed_size: 400
  hidden_size: 2048
  num_layers: 2
  max_explanation_length: 15

training:
  learning_rate: 0.0001
  num_epochs: 50
  batch_size: 128
  num_workers: 4
  save_dir: "weights/baseline"

Train the model

# Using script (recommended)
bash scripts/train.sh

# Or direct command
python src/models/baseline_model/train.py --config src/models/baseline_model/config/config.yaml

Evaluate the model

# Using script
bash scripts/evaluate.sh

# Or direct command  
python src/models/baseline_model/evaluate.py --model_path weights/baseline/best_model.pth

Use pre-trained weights

Download from :

# The model weights are available on Hugging Face
# Follow the repository instructions to download and use

Evaluation Metrics

Both models provide comprehensive evaluation metrics:

Metric	Description
Answer Accuracy	Exact match accuracy for answers
BLEU-1/2/3/4	N-gram precision for explanations
BERTScore	Contextual similarity score
METEOR	Semantic similarity with WordNet
ROUGE-L	Longest common subsequence
CIDEr	Consensus-based evaluation
SPICE	Semantic propositional evaluation

Directory Structure

ViVQA-X/
├── data/                          # Dataset files
│   ├── vqax/                         # Original VQA-X dataset
│   ├── translation/                  # Translation intermediate files
│   ├── selection/                    # Quality selection files
│   └── final/                        # Final ViVQA-X dataset
│       ├── ViVQA-X_train.json
│       ├── ViVQA-X_val.json
│       └── ViVQA-X_test.json
├── notebooks/                     # Jupyter notebooks for analysis
├── scripts/                       # Utility scripts
│   ├── download_vqax.sh              # Download original dataset
│   ├── pipeline.sh                   # Run complete pipeline
│   ├── train.sh                      # Train baseline model
│   └── evaluate.sh                   # Evaluate models
├── src/                           # Source code
│   ├── models/                       # Model implementations
│   │   ├── baseline_model/           # LSTM-Generative model
│   │   │   ├── config/               # Configuration files
│   │   │   ├── dataloaders/          # Data loading utilities
│   │   │   ├── metrics/              # Evaluation metrics
│   │   │   ├── utils/                # Helper utilities
│   │   │   ├── weights/              # Model checkpoints
│   │   │   ├── train.py              # Training script
│   │   │   ├── evaluate.py           # Evaluation script
│   │   │   └── vivqax_model.py       # Model architecture
│   │   └── heuristic_model/          # Rule-based baseline
│   │       ├── config/               # Configuration files
│   │       ├── dataloaders/          # Data loading utilities
│   │       ├── metrics/              # Evaluation metrics
│   │       ├── utils/                # Helper utilities
│   │       ├── run_heuristic.py      # Main evaluation script
│   │       └── heuristic_baseline.py # Model implementation
│   └── pipeline/                     # Data processing pipeline
│       ├── translation/              # Translation modules
│       │   ├── translators/          # Various translator implementations
│       │   └── translation.py        # Translation pipeline
│       ├── selection/                # Quality selection modules
│       │   ├── evaluators/           # LLM evaluators
│       │   └── selection.py          # Selection pipeline
│       ├── post_processing/          # Post-processing modules
│       └── pipeline.py               # Main pipeline script
├── requirements.txt               # Python dependencies
├── LICENSE                        
└── README.md                      # This file

Citation

If you use this dataset or code in your research, please cite our paper:

@InProceedings{duong2026vivqax,
  author    = {Truong-Binh Duong and Hoang-Minh Tran and Binh-Nam Le-Nguyen and Dinh-Thang Duong},
  title     = {An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset},
  booktitle = {Proceedings of the Fifth International Conference on Intelligent Systems and Networks},
  series    = {Lecture Notes in Networks and Systems},
  year      = {2026},
  publisher = {Springer Nature Singapore},
  pages     = {164--173},
  isbn      = {978-981-95-1746-6},
  doi       = {10.1007/978-981-95-1746-6_18}
}

🤗 Dataset • 🤗 Model • 🤗 Demo • 📧 Contact

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Table of Contents

Introduction

Dataset

🔗 Access Points

📈 Dataset Statistics

Quick Start

Installation

Prerequisites

Setup Instructions

Usage

Pipeline

Benchmark

Heuristic Model

Baseline Model

Evaluation Metrics

Directory Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
data/final		data/final
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

duongtruongbinh/ViVQA-X

Folders and files

Latest commit

History

Repository files navigation

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Table of Contents

Introduction

Dataset

🔗 Access Points

📈 Dataset Statistics

Quick Start

Installation

Prerequisites

Setup Instructions

Usage

Pipeline

Benchmark

Heuristic Model

Baseline Model

Evaluation Metrics

Directory Structure

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages