Thanks to visit codestin.com
Credit goes to github.com

Skip to content

CVPR 2025 - Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Notifications You must be signed in to change notification settings

Soqoro/AnyAttack

 
 

Repository files navigation

This repository provides the official implementation of the paper "Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models" Our method demonstrates high effectiveness across a wide range of commercial Vision-Language Models (VLMs).

arXiv | Project Page | HuggingFace

Example Results Figure: AnyAttack results on various commercial VLMs

Key Features

  • Large-scale: Our approach is designed to work effectively on large-scale LAION-400M datasets.
  • Self-supervised: AnyAttack utilizes self-supervised learning techniques for generating adversarial examples.

Installation

Step 1: Environment Setup

  1. Create Conda environment for LAVIS:
    Set up the LAVIS environment for BLIP, BLIP2, and InstructBLIP. Follow the instructions here.

  2. Optional: Mini-GPT4 environment: If you plan to evaluate on Mini-GPT4 series models, set up an additional environment according to Mini-GPT4's installation guide.

  3. Data Preparation:

    • Required Datasets:
      • MSCOCO and Flickr30K: Available here.
      • ImageNet: Download and prepare separately.
    • Optional Dataset:
      • LAION-400M: Only required if you plan to pretrain on LAION-400M. Use the img2dataset tool for downloading.

Step 2: Download Checkpoints and JSON Files

  • Download pretrained models and configuration files from OneDrive.
  • Place the downloaded files in the project root directory.
  • Note: If you're unsure which weight file to use for your specific task or dataset, we recommend starting with coco_cos.pt.

Step 3 (Optional): Training and Fine-tuning

You can either use the pretrained weights from Step 2 or train the models from scratch.

  1. Optional: Pretraining on LAION-400M: If you choose to pretrain on LAION-400M:

    ./scripts/main.sh

    Replace "YOUR_LAION_DATASET" with your LAION-400M dataset path.

  2. Fine-tuning on downstream datasets:

    ./scripts/finetune_ddp.sh

    Adjust the dataset, criterion, and data_dir parameters as needed.

Step 4: Generate Adversarial Images

Use the pretrained decoder to generate adversarial images:

./scripts/generate_adv_img.sh

Ensure that datasets from Step 1 are stored under the DATASET_BASE_PATH directory, and set PROJECT_PATH to the current project directory.

Step 5: Evaluation

Evaluate the trained models on different tasks:

  1. Image-text retrieval:
    ./scripts/retrieval.sh
  2. Multimodal classification:
    python ./scripts/classification_shell.py
  3. Image captioning:
    python ./scripts/caption_shell.py

Demo

We've added a demo.py script for easy demonstration of AnyAttack. This script allows users to generate adversarial examples using a single target image and a clean image.

To run the demo:

python demo.py --decoder_path path/to/decoder.pth --clean_image_path path/to/clean_image.jpg --target_image_path path/to/target_image.jpg --output_path output.png

For more options and details, please refer to the demo.py file.

Citation

If you find this work useful for your research, please consider citing:

@inproceedings{zhang2025anyattack,
  title={Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models},
  author={Zhang, Jiaming and Ye, Junhong and Ma, Xingjun and Li, Yige and Yang, Yunfan and Chen, Yunhao and Sang, Jitao and Yeung, Dit-Yan},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={19900--19909},
  year={2025}
}

Contact

For any questions or concerns, please open an issue in this repository or contact the authors directly.

About

CVPR 2025 - Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Shell 1.2%