[ECCV 2022] DFvT for Image Classification

Official implementation of "Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation". Please refer to here for details.

Authors: Li Gao, Dong Nie, Bo Li, Xiaofeng Ren.

Model Zoo

model	Top-1 Acc.(%)	#Params(M)	FLOPs(G)	Link
DFvT-Tiny	72.95	4.0	0.3	model/log
DFvT-Small	78.29	11.2	0.8	model/log
DFvT-Base	81.98	37.3	2.5	model/log

Prerequisite

Creat a new conda environment

conda create -n DFvT python=3.7 -y
conda activate DFvT
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
pip install timm==0.3.2 opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8

install Apex

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Prepare dataset

Download ImageNet-1K dataset from http://image-net.org/, then organize the folder as follows:

imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Training and Evaluation example

Training from scratch

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py \ 
--cfg <config-file> --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]

For example, to evaluate the DFvT-S with 4 GPU:

python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345  main.py \
--cfg configs/small.yaml --data-path <imagenet-path> --batch-size 256

Evaluation

To evaluate a pre-trained DFvT on ImageNet val, run:

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --eval \
--cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>

For example, to evaluate the DFvT-S with a single GPU:

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
--cfg configs/small.yaml --resume DFvT_S_7829.pth --data-path <imagenet-path>

Throughput

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345  main.py \
--cfg <config-file> --data-path <imagenet-path> --batch-size 64 --throughput --amp-opt-level O0

License

The code is heavily borrowed from Swin-Transformer.

If you use this code in your research please consider citing

@InProceedings{10.1007/978-3-031-20050-2_43,
author="Gao, Li
and Nie, Dong
and Li, Bo
and Ren, Xiaofeng",
title="Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation",
booktitle="Computer Vision -- ECCV 2022",
year="2022",
publisher="Springer Nature Switzerland",
address="Cham",
pages="744--761",
isbn="978-3-031-20050-2"
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
configs		configs
data		data
models		models
README.md		README.md
config.py		config.py
logger.py		logger.py
lr_scheduler.py		lr_scheduler.py
main.py		main.py
optimizer.py		optimizer.py
readme.md		readme.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

[ECCV 2022] DFvT for Image Classification

Model Zoo

Prerequisite

Training and Evaluation example

Training from scratch

Evaluation

Throughput

License

Other Links

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

Uh oh!

ginobilinie/DFvT

Folders and files

Latest commit

History

Repository files navigation

[ECCV 2022] DFvT for Image Classification

Model Zoo

Prerequisite

Training and Evaluation example

Training from scratch

Evaluation

Throughput

License

Other Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages