Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" and [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting"

License

Notifications You must be signed in to change notification settings

whuhxb/DeepSolo

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

News | Main Results | Usage | Citation | Acknowledgement

This is the official repo for the paper:

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao

image

News

2023.05.31 The extension paper (DeepSolo++) is submitted to ArXiv. The code and models will be released soon.

2023.02.28 DeepSolo is accepted by CVPR 2023. πŸŽ‰πŸŽ‰

Relevant Project:

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer | Code

Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, Dacheng Tao

Other applications of ViTAE inlcude: ViTPose | Remote Sensing | Matting | VSA | Video Object Segmentation

Main Results

Total-Text

Backbone External Data Det-P Det-R Det-F1 E2E-None E2E-Full Weights
Res-50 Synth150K 93.9 82.1 87.6 78.8 86.2 OneDrive
Res-50 Synth150K+MLT17+IC13+IC15 93.1 82.1 87.3 79.7 87.0 OneDrive
Res-50 Synth150K+MLT17+IC13+IC15+TextOCR 93.2 84.6 88.7 $\underline{\text{82.5}}$ $\underline{\text{88.7}}$ OneDrive
Res-101 Synth150K+MLT17+IC13+IC15 93.2 83.5 88.1 80.1 87.1 OneDrive
Swin-T Synth150K+MLT17+IC13+IC15 92.8 83.5 87.9 79.7 87.1 OneDrive
Swin-S Synth150K+MLT17+IC13 +C15 93.7 84.2 88.7 81.3 87.8 OneDrive
ViTAEv2-S Synth150K+MLT17+IC13+IC15 92.6 85.5 $\underline{\text{88.9}}$ 81.8 88.4 OneDrive
ViTAEv2-S Synth150K+MLT17+IC13+IC15+TextOCR 92.9 87.4 90.0 83.6 89.6 OneDrive

ICDAR 2015 (IC15)

Backbone External Data Det-P Det-R Det-F1 E2E-S E2E-W E2E-G Weights
Res-50 Synth150K+Total-Text+MLT17+IC13 92.8 87.4 90.0 86.8 81.9 76.9 OneDrive
Res-50 Synth150K+Total-Text+MLT17+IC13+TextOCR 92.5 87.2 89.8 $\underline{\text{88.0}}$ $\underline{\text{83.5}}$ $\underline{\text{79.1}}$ OneDrive
ViTAEv2-S Synth150K+Total-Text+MLT17+IC13 93.7 87.3 90.4 87.5 82.8 77.7 OneDrive
ViTAEv2-S Synth150K+Total-Text+MLT17+IC13+TextOCR 92.4 87.9 $\underline{\text{90.1}}$ 88.1 83.9 79.5 OneDrive

CTW1500

Backbone External Data Det-P Det-R Det-F1 E2E-None E2E-Full Weights
Res-50 Synth150K+Total-Text+MLT17+IC13+IC15 93.2 85.0 88.9 64.2 81.4 OneDrive

Pre-trained Models for Total-Text & ICDAR 2015

Backbone Training Data Weights
Res-50 Synth150K+Total-Text OneDrive
Res-50 Synth150K+Total-Text+MLT17+IC13+IC15 OneDrive
Res-50 Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR OneDrive
Res-101 Synth150K+Total-Text+MLT17+IC13+IC15 OneDrive
Swin-T Synth150K+Total-Text+MLT17+IC13+IC15 OneDrive
Swin-S Synth150K+Total-Text+MLT17+IC13+IC15 OneDrive
ViTAEv2-S Synth150K+Total-Text+MLT17+IC13+IC15 OneDrive
ViTAEv2-S Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR OneDrive

Pre-trained Models for CTW1500

Backbone Training Data Weights
Res-50 Synth150K+Total-Text+MLT17+IC13+IC15 OneDrive

Usage

  • Installation

Python 3.8 + PyTorch 1.9.0 + CUDA 11.1 + Detectron2 (v0.6)

git clone https://github.com/ViTAE-Transformer/DeepSolo.git
cd DeepSolo
conda create -n deepsolo python=3.8 -y
conda activate deepsolo
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
python setup.py build develop
  • Preparation

Datasets

[SynthText150K (CurvedSynText150K)] images | annotations(Part1) | annotations(Part2)

[MLT] images | annotations

[ICDAR2013] images | annotations

[ICDAR2015] images | annotations

[Total-Text] images | annotations

[CTW1500] images | annotations

[TextOCR] images | annotations

[Inverse-Text] images | annotations

[Evaluation ground-truth] Link

Some files need to be renamed. Organize them as follows (lexicon files are not listed here):

|- ./datasets
   |- syntext1
   |  |- train_images
   |  β””  annotations
   |       |- train_37voc.json
   |       β””  train_96voc.json
   |- syntext2
   |  |- train_images
   |  β””  annotations
   |       |- train_37voc.json
   |       β””  train_96voc.json
   |- mlt2017
   |  |- train_images
   |  |- train_37voc.json
   |  β””  train_96voc.json
   |- totaltext
   |  |- train_images
   |  |- test_images
   |  |- train_37voc.json
   |  |- train_96voc.json
   |  β””  test.json
   |- ic13
   |  |- train_images
   |  |- train_37voc.json
   |  β””  train_96voc.json
   |- ic15
   |  |- train_images
   |  |- test_images
   |  |- train_37voc.json
   |  |- train_96voc.json
   |  β””  test.json
   |- ctw1500
   |  |- train_images
   |  |- test_images
   |  |- train_96voc.json
   |  β””  test.json
   |- textocr
   |  |- train_images
   |  |- train_37voc_1.json
   |  β””  train_37voc_2.json
   |- inversetext
   |  |- test_images
   |  β””  test.json
   |- evaluation
   |  |- gt_*.zip
ImageNet Pre-trained Backbone

If you want to pre-train DeepSolo with ResNet-101, ViTAEv2-S or SwinTransformer , you can download the converted backbone weights and put them under pretrained_backbone for initialization: Swin-T | ViTAEv2-S | Res101 | Swin-S. You can also refer to the python files in pretrained_backbone and convert the backbones by yourself.

  • Training

Total-Text & ICDAR2015

1. Pre-train

For example, pre-train DeepSolo with Synth150K+Total-Text+MLT17+IC13+IC15:

python tools/train_net.py --config-file configs/R_50/pretrain/150k_tt_mlt_13_15.yaml --num-gpus 4

2. Fine-tune

Fine-tune on Total-Text or ICDAR2015:

python tools/train_net.py --config-file configs/R_50/TotalText/finetune_150k_tt_mlt_13_15.yaml --num-gpus 4
python tools/train_net.py --config-file configs/R_50/IC15/finetune_150k_tt_mlt_13_15.yaml --num-gpus 4
CTW1500

1. Pre-train

python tools/train_net.py --config-file configs/R_50/CTW1500/pretrain_96voc_50maxlen.yaml --num-gpus 4

2. Fine-tune

python tools/train_net.py --config-file configs/R_50/CTW1500/finetune_96voc_50maxlen.yaml --num-gpus 4
  • Evaluation

python tools/train_net.py --config-file ${CONFIG_FILE} --eval-only MODEL.WEIGHTS ${MODEL_PATH}
  • Visualization Demo

python demo/demo.py --config-file ${CONFIG_FILE} --input ${IMAGES_FOLDER_OR_ONE_IMAGE_PATH} --output ${OUTPUT_PATH} --opts MODEL.WEIGHTS <MODEL_PATH>

Citation

If you find DeepSolo helpful, please consider giving this repo a star:star: and citing:

@inproceedings{ye2022deepsolo,
  title={DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting},
  author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
  booktitle={CVPR},
  year={2023}
}

Acknowledgement

This project is based on Adelaidet. For academic use, this project is licensed under the 2-clause BSD License.

About

The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" and [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.8%
  • Cuda 15.5%
  • C++ 1.7%