CVPR 2025 "Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models"

This repository provides the official implementation of the paper "Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models" Our method demonstrates high effectiveness across a wide range of commercial Vision-Language Models (VLMs).

arXiv | Project Page | HuggingFace

Figure: AnyAttack results on various commercial VLMs

Key Features

Large-scale: Our approach is designed to work effectively on large-scale LAION-400M datasets.
Self-supervised: AnyAttack utilizes self-supervised learning techniques for generating adversarial examples.

Installation

Step 1: Environment Setup

Create Conda environment for LAVIS:
Set up the LAVIS environment for BLIP, BLIP2, and InstructBLIP. Follow the instructions here.
Optional: Mini-GPT4 environment: If you plan to evaluate on Mini-GPT4 series models, set up an additional environment according to Mini-GPT4's installation guide.
Data Preparation:
- Required Datasets:
  - MSCOCO and Flickr30K: Available here.
  - ImageNet: Download and prepare separately.
- Optional Dataset:
  - LAION-400M: Only required if you plan to pretrain on LAION-400M. Use the img2dataset tool for downloading.

Step 2: Download Checkpoints and JSON Files

Download pretrained models and configuration files from OneDrive.
Place the downloaded files in the project root directory.
Note: If you're unsure which weight file to use for your specific task or dataset, we recommend starting with coco_cos.pt.

Step 3 (Optional): Training and Fine-tuning

You can either use the pretrained weights from Step 2 or train the models from scratch.

Optional: Pretraining on LAION-400M: If you choose to pretrain on LAION-400M:
```
./scripts/main.sh
```
Replace "YOUR_LAION_DATASET" with your LAION-400M dataset path.
Fine-tuning on downstream datasets:
```
./scripts/finetune_ddp.sh
```
Adjust the dataset, criterion, and data_dir parameters as needed.

Step 4: Generate Adversarial Images

Use the pretrained decoder to generate adversarial images:

./scripts/generate_adv_img.sh

Ensure that datasets from Step 1 are stored under the DATASET_BASE_PATH directory, and set PROJECT_PATH to the current project directory.

Step 5: Evaluation

Evaluate the trained models on different tasks:

Image-text retrieval:
```
./scripts/retrieval.sh
```

Multimodal classification:

python ./scripts/classification_shell.py

Image captioning:
```
python ./scripts/caption_shell.py
```

Demo

We've added a demo.py script for easy demonstration of AnyAttack. This script allows users to generate adversarial examples using a single target image and a clean image.

To run the demo:

python demo.py --decoder_path path/to/decoder.pth --clean_image_path path/to/clean_image.jpg --target_image_path path/to/target_image.jpg --output_path output.png

For more options and details, please refer to the demo.py file.

Citation

If you find this work useful for your research, please consider citing:

@inproceedings{zhang2025anyattack,
  title={Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models},
  author={Zhang, Jiaming and Ye, Junhong and Ma, Xingjun and Li, Yige and Yang, Yunfan and Chen, Yunhao and Sang, Jitao and Yeung, Dit-Yan},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={19900--19909},
  year={2025}
}

Contact

For any questions or concerns, please open an issue in this repository or contact the authors directly.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
attacked_model		attacked_model
lavis_tool		lavis_tool
models		models
scripts		scripts
README.md		README.md
caption.py		caption.py
classification.py		classification.py
dataset.py		dataset.py
demo.py		demo.py
example.jpg		example.jpg
finetune_ddp.py		finetune_ddp.py
generate_adv_img.py		generate_adv_img.py
main_ddp.py		main_ddp.py
retrieval.py		retrieval.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CVPR 2025 "Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models"

Key Features

Installation

Step 1: Environment Setup

Step 2: Download Checkpoints and JSON Files

Step 3 (Optional): Training and Fine-tuning

Step 4: Generate Adversarial Images

Step 5: Evaluation

Demo

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

Soqoro/AnyAttack

Folders and files

Latest commit

History

Repository files navigation

CVPR 2025 "Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models"

Key Features

Installation

Step 1: Environment Setup

Step 2: Download Checkpoints and JSON Files

Step 3 (Optional): Training and Fine-tuning

Step 4: Generate Adversarial Images

Step 5: Evaluation

Demo

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages