Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)

Notifications You must be signed in to change notification settings

AIM-SKKU/QA-TIGER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Question-Aware Gaussian Experts for Audio-Visual Question Answering (CVPR 2025, Highlight)

Official repository for "Question-Aware Gaussian Experts for Audio-Visual Question Answering" in CVPR 2025.

Authors: Hongyeob Kim1*, Inyoung Jung1*, Dayoon Suh2, Youjija Zhang1, Sangmin Lee1, Sungeun Hong1
1Sungkyunkwan University, 2Purdue University

Abstract

Audio-Visual Question Answering (AVQA) requires not only question-based multimodal reasoning but also precise temporal grounding to capture subtle dynamics for accurate prediction. However, existing methods mainly use question information implicitly, limiting focus on question-specific details. Furthermore, most studies rely on uniform frame sampling, which can miss key question-relevant frames. Although recent Top-K frame selection methods aim to address this, their discrete nature still overlooks fine-grained temporal details. This paper proposes QA-TIGER, a novel framework that explicitly incorporates question information and models continuous temporal dynamics. Our key idea is to use Gaussian-based modeling to adaptively focus on both consecutive and non-consecutive frames based on the question, while explicitly injecting question information and applying progressive refinement. We leverage a Mixture of Experts (MoE) to flexibly implement multiple Gaussian models, activating temporal experts specifically tailored to the question. Extensive experiments on multiple AVQA benchmarks show that QA-TIGER consistently achieves state-of-the-art performance.

Requirements

python3.10 +
pytorch2.4.0

Usage

  1. Clone this repo

    git clone https://github.com/AIM-SKKU/QA-TIGER.git
  2. Setting the environment

    • with conda
    conda create -n qa-tiger python=3.10
    conda activate qa-tiger
    pip install -e .
    
    • with pip
    pip install -e .
    
    • with uv
    uv sync
    source .venv/bin/activate
    
  3. Prepare data

  4. Feature extraction

    • we follow the same protocol as TSPM for feature extraction. Please refer to TSPM

    • put the extracted features under ./data/feats/

      data
      ┣ annots
      ┃ ┣ music_avqa
      ┃ ┣ music_avqa_r
      ┃ ┣ music_avqa_v2
      ┣ feats
      ┃ ┣ frame_ViT-L14@336px
      ┃ ┣ visual_tome14
      ┃ ┣ ...
      ┗ ┗ vggish
      
  5. Training

    bash scripts/train.sh <CONFIG> <GPU_IDX>
  6. Testing

    bash scripts/test.sh <CONFIG> <GPU_IDX> <WEIGHT> <OUTPUT_LOG_PATH>

Checkpoints

Citation

If you find this work useful, please consider citing it.

@inproceedings{kim2025qatiger,
    title={Question-Aware Gaussian Experts for Audio-Visual Question Answering},
    author={Hongyeob Kim and Inyoung Jung and Dayoon Suh and Youjia Zhang and Sangmin Lee and Sungeun Hong},
    booktitle={CVPR},
    year={2025}
}

Acknowledgement

We acknowledge the following code, which served as a reference for our implementation.

About

Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published