Official implementation of "Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation". Please refer to here for details.
Authors: Li Gao, Dong Nie, Bo Li, Xiaofeng Ren.
| model | Top-1 Acc.(%) | #Params(M) | FLOPs(G) | Link |
|---|---|---|---|---|
| DFvT-Tiny | 72.95 | 4.0 | 0.3 | model/log |
| DFvT-Small | 78.29 | 11.2 | 0.8 | model/log |
| DFvT-Base | 81.98 | 37.3 | 2.5 | model/log |
Creat a new conda environment
conda create -n DFvT python=3.7 -y
conda activate DFvT
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
pip install timm==0.3.2 opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8
install Apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Prepare dataset
Download ImageNet-1K dataset from http://image-net.org/, then organize the folder as follows:
imagenet
├── train
│ ├── class1
│ │ ├── img1.jpeg
│ │ ├── img2.jpeg
│ │ └── ...
│ ├── class2
│ │ ├── img3.jpeg
│ │ └── ...
│ └── ...
└── val
├── class1
│ ├── img4.jpeg
│ ├── img5.jpeg
│ └── ...
├── class2
│ ├── img6.jpeg
│ └── ...
└── ...
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py \
--cfg <config-file> --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
For example, to evaluate the DFvT-S with 4 GPU:
python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 main.py \
--cfg configs/small.yaml --data-path <imagenet-path> --batch-size 256
To evaluate a pre-trained DFvT on ImageNet val, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --eval \
--cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>
For example, to evaluate the DFvT-S with a single GPU:
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
--cfg configs/small.yaml --resume DFvT_S_7829.pth --data-path <imagenet-path>
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py \
--cfg <config-file> --data-path <imagenet-path> --batch-size 64 --throughput --amp-opt-level O0
The code is heavily borrowed from Swin-Transformer.
If you use this code in your research please consider citing
@InProceedings{10.1007/978-3-031-20050-2_43,
author="Gao, Li
and Nie, Dong
and Li, Bo
and Ren, Xiaofeng",
title="Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation",
booktitle="Computer Vision -- ECCV 2022",
year="2022",
publisher="Springer Nature Switzerland",
address="Cham",
pages="744--761",
isbn="978-3-031-20050-2"
}
Object-Detection: See DFvT for Object Detection.