Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

duongtruongbinh/ViVQA-X

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

πŸ“„ Official Paper: https://link.springer.com/chapter/10.1007/978-981-95-1746-6_18

Paper Python Version PyTorch Dataset Model Demo

This repository contains the code and resources for the paper "An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset".

Pipeline

Table of Contents

Introduction

This project introduces ViVQA-X, the first Vietnamese dataset for Visual Question Answering with Natural Language Explanations (VQA-NLE). Developed using a novel automated pipeline, our work provides a crucial resource to advance research in multimodal AI and explainability for the Vietnamese language. ViVQA-X features:

  • 32,886 question-answer pairs with detailed explanations
  • 41,817 high-quality natural language explanations
  • Multi-stage automated pipeline for translation and quality control
  • Comprehensive evaluation using multiple state-of-the-art models

This project facilitates research in Vietnamese visual question answering and supports the development of explainable AI systems for Vietnamese language understanding.

Dataset

πŸ”— Access Points

Resource Description Link
Dataset ViVQA-X Dataset on Hugging Face Dataset
Model Weights Pre-trained LSTM-Generative Model Model
Demo Interactive Demo Space Demo

πŸ“ˆ Dataset Statistics

  • QA Pairs: 32,886 pairs across Train/Validation/Test splits
  • Explanations: 41,817 high-quality explanations
  • Average Words: 10 words per explanation
  • Vocabulary Size: 4,232 unique words in explanations
  • Images: COCO dataset images with Vietnamese annotations

The dataset is organized into JSON files located in the data/final directory, containing questions, answers, and explanations associated with images from the COCO dataset.

Quick Start

# Clone the repository
git clone https://github.com/duongtruongbinh/ViVQA-X.git
cd ViVQA-X

# Install dependencies
pip install -r requirements.txt

# Download the dataset
bash scripts/download_vqax.sh

# Run the complete pipeline
bash scripts/pipeline.sh

Installation

Prerequisites

  • Python 3.8+
  • CUDA 11.2+ (for GPU support)
  • 8GB+ RAM recommended

Setup Instructions

  1. Clone the repository

    git clone https://github.com/duongtruongbinh/ViVQA-X.git
    cd ViVQA-X
  2. Create and activate virtual environment

    # Using conda (recommended)
    conda create -n vivqa-x python=3.8
    conda activate vivqa-x
    
    # Or using venv
    python -m venv vivqa-x
    source vivqa-x/bin/activate  # On Windows: vivqa-x\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Set up environment variables (for pipeline)

    # Copy example environment file
    cp .env.example .env
    
    # Edit .env file and add your API keys:
    # OPENAI_API_KEY=your_openai_api_key_here
    # GEMINI_APIKEYS=your_gemini_api_key_1,your_gemini_api_key_2
  5. Download the original VQA-X dataset

    bash scripts/download_vqax.sh
  6. Download the COCO dataset The ViVQA-X dataset uses images from the COCO 2014 dataset. You need to download the train2014 and val2014 image sets.

    # Create directory for COCO data
    mkdir -p data/coco
    
    # Download and unzip Train 2014 images (~13GB)
    wget http://images.cocodataset.org/zips/train2014.zip -P data/coco/
    unzip data/coco/train2014.zip -d data/coco/
    rm data/coco/train2014.zip
    
    # Download and unzip Validation 2014 images (~6GB)
    wget http://images.cocodataset.org/zips/val2014.zip -P data/coco/
    unzip data/coco/val2014.zip -d data/coco/
    rm data/coco/val2014.zip

    After this step, you should have the following directory structure: data/coco/train2014 and data/coco/val2014.

Usage

Pipeline

Run the complete translation and processing pipeline:

bash scripts/pipeline.sh

This will:

  • Translate English VQA-X to Vietnamese
  • Apply quality selection mechanisms
  • Post-process the results
  • Generate the final ViVQA-X dataset

Benchmark

We provide comprehensive benchmarks using multiple state-of-the-art models:

Model Repository
Heuristic Model Included
LSTM-Generative Included
NLX-GPT GitHub
OFA-X GitHub
ReRe GitHub

Heuristic Model

A rule-based approach requiring no training:

  1. Configure the model

    # src/models/heuristic_model/config/config.yaml
    data:
      train_path: "data/final/ViVQA-X_train.json"
      val_path: "data/final/ViVQA-X_val.json"
      test_path: "data/final/ViVQA-X_test.json"
      train_image_dir: "data/coco/train2014"
      val_image_dir: "data/coco/val2014"
      test_image_dir: "data/coco/val2014"
  2. Run evaluation

    python src/models/heuristic_model/run_heuristic.py

Baseline Model

LSTM-Generative model with attention mechanism:

  1. Configure the model

    # src/models/baseline_model/config/config.yaml
    data:
      train_path: "data/final/ViVQA-X_train.json"
      val_path: "data/final/ViVQA-X_val.json"
      test_path: "data/final/ViVQA-X_test.json"
      train_image_dir: 'data/coco/train2014'
      val_image_dir: 'data/coco/val2014'
      test_image_dir: 'data/coco/val2014'
    
    model:
      device: "cuda:0"  # Adjust based on GPU availability
      embed_size: 400
      hidden_size: 2048
      num_layers: 2
      max_explanation_length: 15
    
    training:
      learning_rate: 0.0001
      num_epochs: 50
      batch_size: 128
      num_workers: 4
      save_dir: "weights/baseline"
  2. Train the model

    # Using script (recommended)
    bash scripts/train.sh
    
    # Or direct command
    python src/models/baseline_model/train.py --config src/models/baseline_model/config/config.yaml
  3. Evaluate the model

    # Using script
    bash scripts/evaluate.sh
    
    # Or direct command  
    python src/models/baseline_model/evaluate.py --model_path weights/baseline/best_model.pth
  4. Use pre-trained weights

    Download from Model:

    # The model weights are available on Hugging Face
    # Follow the repository instructions to download and use

Evaluation Metrics

Both models provide comprehensive evaluation metrics:

Metric Description
Answer Accuracy Exact match accuracy for answers
BLEU-1/2/3/4 N-gram precision for explanations
BERTScore Contextual similarity score
METEOR Semantic similarity with WordNet
ROUGE-L Longest common subsequence
CIDEr Consensus-based evaluation
SPICE Semantic propositional evaluation

Directory Structure

ViVQA-X/
β”œβ”€β”€ data/                          # Dataset files
β”‚   β”œβ”€β”€ vqax/                         # Original VQA-X dataset
β”‚   β”œβ”€β”€ translation/                  # Translation intermediate files
β”‚   β”œβ”€β”€ selection/                    # Quality selection files
β”‚   └── final/                        # Final ViVQA-X dataset
β”‚       β”œβ”€β”€ ViVQA-X_train.json
β”‚       β”œβ”€β”€ ViVQA-X_val.json
β”‚       └── ViVQA-X_test.json
β”œβ”€β”€ notebooks/                     # Jupyter notebooks for analysis
β”œβ”€β”€ scripts/                       # Utility scripts
β”‚   β”œβ”€β”€ download_vqax.sh              # Download original dataset
β”‚   β”œβ”€β”€ pipeline.sh                   # Run complete pipeline
β”‚   β”œβ”€β”€ train.sh                      # Train baseline model
β”‚   └── evaluate.sh                   # Evaluate models
β”œβ”€β”€ src/                           # Source code
β”‚   β”œβ”€β”€ models/                       # Model implementations
β”‚   β”‚   β”œβ”€β”€ baseline_model/           # LSTM-Generative model
β”‚   β”‚   β”‚   β”œβ”€β”€ config/               # Configuration files
β”‚   β”‚   β”‚   β”œβ”€β”€ dataloaders/          # Data loading utilities
β”‚   β”‚   β”‚   β”œβ”€β”€ metrics/              # Evaluation metrics
β”‚   β”‚   β”‚   β”œβ”€β”€ utils/                # Helper utilities
β”‚   β”‚   β”‚   β”œβ”€β”€ weights/              # Model checkpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ train.py              # Training script
β”‚   β”‚   β”‚   β”œβ”€β”€ evaluate.py           # Evaluation script
β”‚   β”‚   β”‚   └── vivqax_model.py       # Model architecture
β”‚   β”‚   └── heuristic_model/          # Rule-based baseline
β”‚   β”‚       β”œβ”€β”€ config/               # Configuration files
β”‚   β”‚       β”œβ”€β”€ dataloaders/          # Data loading utilities
β”‚   β”‚       β”œβ”€β”€ metrics/              # Evaluation metrics
β”‚   β”‚       β”œβ”€β”€ utils/                # Helper utilities
β”‚   β”‚       β”œβ”€β”€ run_heuristic.py      # Main evaluation script
β”‚   β”‚       └── heuristic_baseline.py # Model implementation
β”‚   └── pipeline/                     # Data processing pipeline
β”‚       β”œβ”€β”€ translation/              # Translation modules
β”‚       β”‚   β”œβ”€β”€ translators/          # Various translator implementations
β”‚       β”‚   └── translation.py        # Translation pipeline
β”‚       β”œβ”€β”€ selection/                # Quality selection modules
β”‚       β”‚   β”œβ”€β”€ evaluators/           # LLM evaluators
β”‚       β”‚   └── selection.py          # Selection pipeline
β”‚       β”œβ”€β”€ post_processing/          # Post-processing modules
β”‚       └── pipeline.py               # Main pipeline script
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ LICENSE                        
└── README.md                      # This file

Citation

If you use this dataset or code in your research, please cite our paper:

@InProceedings{duong2026vivqax,
  author    = {Truong-Binh Duong and Hoang-Minh Tran and Binh-Nam Le-Nguyen and Dinh-Thang Duong},
  title     = {An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset},
  booktitle = {Proceedings of the Fifth International Conference on Intelligent Systems and Networks},
  series    = {Lecture Notes in Networks and Systems},
  year      = {2026},
  publisher = {Springer Nature Singapore},
  pages     = {164--173},
  isbn      = {978-981-95-1746-6},
  doi       = {10.1007/978-981-95-1746-6_18}
}

About

[ICISN 2025] An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •