Question-Aware Gaussian Experts for Audio-Visual Question Answering (CVPR 2025, Highlight)

Official repository for "Question-Aware Gaussian Experts for Audio-Visual Question Answering" in CVPR 2025.

Paper | Project Page

Authors: Hongyeob Kim^1*, Inyoung Jung^1*, Dayoon Suh², Youjija Zhang¹, Sangmin Lee¹, Sungeun Hong¹†
¹Sungkyunkwan University, ²Purdue University

Abstract

Audio-Visual Question Answering (AVQA) requires not only question-based multimodal reasoning but also precise temporal grounding to capture subtle dynamics for accurate prediction. However, existing methods mainly use question information implicitly, limiting focus on question-specific details. Furthermore, most studies rely on uniform frame sampling, which can miss key question-relevant frames. Although recent Top-K frame selection methods aim to address this, their discrete nature still overlooks fine-grained temporal details. This paper proposes QA-TIGER, a novel framework that explicitly incorporates question information and models continuous temporal dynamics. Our key idea is to use Gaussian-based modeling to adaptively focus on both consecutive and non-consecutive frames based on the question, while explicitly injecting question information and applying progressive refinement. We leverage a Mixture of Experts (MoE) to flexibly implement multiple Gaussian models, activating temporal experts specifically tailored to the question. Extensive experiments on multiple AVQA benchmarks show that QA-TIGER consistently achieves state-of-the-art performance.

Requirements

python3.10 +
pytorch2.4.0

Usage

Clone this repo

git clone https://github.com/AIM-SKKU/QA-TIGER.git

Setting the environment

with conda

conda create -n qa-tiger python=3.10
conda activate qa-tiger
pip install -e .

with pip

pip install -e .

with uv

uv sync
source .venv/bin/activate

Prepare data
- you can find annotations in ./data/annots/.
- notice: for MUSIC-AVQA-v2.0, we asked the authors about the original split and pre-divided the dataset accordingly.
- Additionally, the following links provide access to download the original annotation and data
- MUSIC-AVQA: https://gewu-lab.github.io/MUSIC-AVQA/
- MUSIC-AVQA-R: https://github.com/reml-group/MUSIC-AVQA-R
- MUSIV-AVQA-v2.0: https://github.com/DragonLiu1995/MUSIC-AVQA-v2.0

Feature extraction

we follow the same protocol as TSPM for feature extraction. Please refer to TSPM

put the extracted features under ./data/feats/

data
┣ annots
┃ ┣ music_avqa
┃ ┣ music_avqa_r
┃ ┣ music_avqa_v2
┣ feats
┃ ┣ frame_ViT-L14@336px
┃ ┣ visual_tome14
┃ ┣ ...
┗ ┗ vggish

Training

bash scripts/train.sh <CONFIG> <GPU_IDX>

Testing

bash scripts/test.sh <CONFIG> <GPU_IDX> <WEIGHT> <OUTPUT_LOG_PATH>

Checkpoints

QA-TIGER with ViT-L/14@336px

Citation

If you find this work useful, please consider citing it.

@inproceedings{kim2025qatiger,
    title={Question-Aware Gaussian Experts for Audio-Visual Question Answering},
    author={Hongyeob Kim and Inyoung Jung and Dayoon Suh and Youjia Zhang and Sangmin Lee and Sungeun Hong},
    booktitle={CVPR},
    year={2025}
}

Acknowledgement

We acknowledge the following code, which served as a reference for our implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
configs		configs
data		data
images		images
scripts		scripts
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Question-Aware Gaussian Experts for Audio-Visual Question Answering (CVPR 2025, Highlight)

Paper | Project Page

Abstract

Requirements

Usage

Checkpoints

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

AIM-SKKU/QA-TIGER

Folders and files

Latest commit

History

Repository files navigation

Question-Aware Gaussian Experts for Audio-Visual Question Answering (CVPR 2025, Highlight)

Paper | Project Page

Abstract

Requirements

Usage

Checkpoints

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages