Thanks to visit codestin.com
Credit goes to github.com

Skip to content

QingyangZhang/QMF

Repository files navigation

Provable Dynamic Fusion for Low-Quality Multimodal Data

Visitor Count

This repository provides the official implementation for the paper "Provable Dynamic Fusion for Low-Quality Multimodal Data" presented at ICML 2023 by Qingyang Zhang and Haitao Wu.

Highlights

  • Theoretical Framework: This paper introduces a theoretical framework to understand the criterion for robust dynamic multimodal fusion.
  • Novel Method: A novel dynamic multimodal fusion method, termed Quality-aware Multimodal Fusion (QMF), is proposed, demonstrating provably better generalization ability.

Environment Setup

To set up the environment, run the following command:

pip install -r requirements.txt

Dataset Preparation

This project uses two types of multimodal datasets: Text-Image Classification and RGBD Scene Recognition.

Text-Image Classification

  1. Download Datasets:

  2. Prepare Splits: The train/dev/test splits (jsonl files) are prepared following the MMBT settings and are provided in their corresponding folders.

  3. Optional: Pre-trained Models for Text Embeddings:

RGBD Scene Recognition

  1. Download Datasets:
    • Download NYUD2
    • Download SUNRGBD
    • Place them in the datasets folder.
    • (Baidu Netdisk links for convenience: NYUD2 (pwd: xhq3), SUNRGBD (pwd: pv6m))

Trained Models

We provide the trained models for download. Please ensure you have the necessary tools to access Baidu Netdisk if using those links.

  • Trained QMF Models: Baidu Netdisk (pwd: 8995)
  • Pre-trained BERT Model: Baidu Netdisk (pwd: zu13)
  • Pre-trained ResNet18 (for RGB-D tasks): PyTorch official pre-trained resnet18 can be downloaded from this link.

Usage Example: Text-Image Classification

To run our method (QMF) on benchmark datasets:

python train_qmf.py --alg qmf --noise_level 0.0 --noise_type Gaussian \

To evaluate and get the reported accuracy in our paper:

python train_qmf.py --alg qmf --epoch 0 --noise_level 5.0 --noise_type Gaussian \

To run TMC (Trusted Multi-View Classification, ICLR'21):

# Set parameters
task="MVSA_Single" # or "food101"
task_type="classification"
model="latefusion" # TMC often involves a fusion step, "latefusion" is used as an example base
i=0 # Example seed

name="${task}_tmc_model_run_${i}" # Naming convention for TMC runs

python train_tmc.py --batch_sz 16 --gradient_accumulation_steps 40 \
    --savedir "./saved/${task}" --name "${name}" --data_path "./datasets/" \
    --task "${task}" --task_type "${task_type}" --model "${model}" --num_image_embeds 3 \
    --freeze_txt 5 --freeze_img 3 --patience 5 --dropout 0.1 --lr 5e-05 --warmup 0.1 --max_epochs 100 --seed "${i}"

Citation

If our QMF method or the idea of dynamic multimodal fusion methods are helpful in your research, please consider citing our paper:

@inproceedings{zhang2023provable,
  title={Provable Dynamic Fusion for Low-Quality Multimodal Data},
  author={Zhang, Qingyang and Wu, Haitao and Zhang, Changqing and Hu, Qinghua and Fu, Huazhu and Zhou, Joey Tianyi and Peng, Xi},
  booktitle={International Conference on Machine Learning},
  year={2023}
}

Acknowledgement

The code implementation is inspired by the following excellent works:


Related Works

Here are some interesting works related to this paper:


Contact

For any additional questions, feel free to email [email protected].

About

[ICML 2023] Provable Dynamic Fusion for Low-Quality Multimodal Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages