Official Implementation of our EMLNP 2025 Paper "When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs"

This repository contains the official code for training, inference, and evaluation of Model Parity Aligner (MPA).

To setup environment

# create new docker container(using the mentioned docker image)
$ docker run -it -d --name MPA --gpus=all -v <path-to-your-directory>:/workspace pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel

# create new env MPA
$ conda create -n MPA python=3.13.5

# activate MPA
$ conda activate MPA

# install dependencies
$ pip install -r requirements.txt

Dataset

Now, we show the results of MPA on four datasets namely, TextVQA, STVQA, ChartQA, and OKVQA. Please follow the following instructions to successfully create the splits used for each dataset.

First, for TextVQA you can download the images and respective annotations from their official website. You can access the train, val, and test splits at the following paths:

train-split: /data/TextVQA/qwenTrainFormat_train.json
val-split: /data/TextVQA/qwenTrainFormat_eval.json
test-split: /data/TextVQA/TextVQA_0.5.1_val.json

Second, for STVQA you can download the images and respective annotations from their official website. You can access the train, val, and test splits at the following paths:

train-split: /data/STVQA/QwenTrainFormat_train_task_1_onePerImage_train.json
val-split: /data/STVQA/QwenTrainFormat_train_task_1_onePerImage_eval.json
test-split: /data/STVQA/train_task_1_onePerImage_val.json

Third, for ChartQA you can download the images and respective annotations from their official github repo. You can access the train, val, and test splits at the following paths:

train-split: /data/ChartVQA/train_onePerImage_QwenFormat_train.json
val-split: /data/ChartVQA/train_onePerImage_QwenFormat_eval.json
test-split: /data/ChartVQA/test_combined.json

Fourth, for OK-VQA you can download the images and respective annotations from their official website. You can access the train, val, and test splits at the following paths:

train-split: /data/OKVQA/okvqa_QwenFormat_train.json
val-split: /data/OKVQA/okvqa_QwenFormat_eval.json
test-split: /data/OKVQA/okvqa_val_combine.json

Pseudo Annotator (PA)

Now, in order to generate Pseudo Annotation of unlabeled images for task 'T', run the following command. This will create a new directory(if one does not already exists) inside the scripts directory and dump the PA json files further inside a directory following the date on which the experiment is being run. Note, demo files for the sake of demonstration are already present in the results directory.

# change to scripts dir
$ cd scripts/

# run the bash script PA.sh
$ bash PA.sh

Parity Identifier (PI)

This is the module that is responsible to identify samples that represent the knowledge gaps between S-VLM and L-VLM. Note, you have to pass the path of the PA output json file inside the respective dataloader in PI.py.

# run the bash script PI.sh
$ bash PI.sh

Parity Leveler

Now, Parity samples obtained by PI module are used to train the SVLM to enhance it. Also, note you have to pass the train json file generated during the PI step in PL.sh to train on the parity samples. Run the following command to do the same:

# run the bash script PL.sh
$ bash PL/Qwen2-VL-Finetune/scripts/PL.sh

Note, we use the following github repo to train the qwen-family models.

Evaluate

Now, to evaluate pre-trained and MPA trained models you can run the following command:

# run the bash script evaluate.sh
$ bash evaluate.sh

License

This code and data are released under the MIT license.

Acknowledgements

We used code-base and pre-trained models of Qwen2vl.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Official Implementation of our EMLNP 2025 Paper "When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs"

To setup environment

Dataset

Pseudo Annotator (PA)

Parity Identifier (PI)

Parity Leveler

Evaluate

License

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

vl2g/MPA

Folders and files

Latest commit

History

Repository files navigation

Official Implementation of our EMLNP 2025 Paper "When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs"

To setup environment

Dataset

Pseudo Annotator (PA)

Parity Identifier (PI)

Parity Leveler

Evaluate

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages