For ease of implementation, i have not implemented exactly the same as paper.
The things presented below are implemented differently from the paper.
-
Backbone network(Used Xception instead of network mentioned in the paper.)
-
Learning rate schedule(Used
tf.keras.optimizers.schedules.ExponentialDecay) -
Data augmentations
-
Hyper parameters
-
And so on ...
docker build -t ${ImageName}:${ImageTag} .- Example
docker run -d -it --gpus all --shm-size=${ShmSize} ${ImageName}:${ImageTag} /bin/bashTraining Dataset: Pascal VOC Dataset (Link)
Pascal VOC Dataset with TFDS
| Train | Validation | Test | |
|---|---|---|---|
| Pascal VOC 2007 | 2501 | 2510 | 4952 (Used Validation) |
| Pascal VOC 2012 | 5717 | 5823 | 10991 (No labels) |
- Training Set: VOC2007 trainval + VOC2012 trainval (Total: 16551)
- Validation Set: VOC2007 test (Total: 4952)
Trained with default values of this repo. (Total Epoch: 105)
- Download pb file: <Google Drive Link>
pb file is uploaded as tar.gz. So, you have to decompress this file like below.
tar -zxvf yolo_voc_448x448.tar.gzIf you want to inference with this pb file, infer to inference_tutorial.ipynb
| class name | AP |
|---|---|
| dog | 0.7464 |
| pottedplant | 0.2250 |
| car | 0.5021 |
| person | 0.4482 |
| tvmonitor | 0.5213 |
| diningtable | 0.4564 |
| bicycle | 0.5927 |
| chair | 0.2041 |
| motorbike | 0.5595 |
| sofa | 0.4801 |
| bus | 0.6215 |
| boat | 0.3274 |
| horse | 0.7049 |
| aeroplane | 0.5872 |
| sheep | 0.4223 |
| bottle | 0.1312 |
| train | 0.7917 |
| cat | 0.8044 |
| bird | 0.4824 |
| cow | 0.4548 |
| mAP | 0.5032 |
- GPU(GeForce RTX 3090): About 8 FPS
- CPU(AMD Ryzen 5 5600X 6-Core Processor): About 4 FPS
python train_voc.pyOptions
Default option values are ./configs/configs.py.
If the options are given, the default config values are overridden.
--epochs: Number of training epochs--init_lr: Initial learning rate--lr_decay_rate: Learning rate decay rate--lr_decay_steps: Learning rate decay steps--batch_size: Training batch size--val_step: Validation interval during training--tb_img_max_outputs: Number of visualized prediction images in tensorboard--train_ds_sample_ratio: Training dataset sampling ratio--val_ds_sample_ratio: Validation dataset sampling ratio
Evaluation pretrained model with VOC2007 Test Dataset
python eval_voc.pyOptions
--batch_size: Evaluation batch size (Default: batch_size of ./configs/configs.py.)--pb_dir: Save pb directory path (Default:./ckpts/voc_ckpts/yolo_voc_448x448)
You Only Look Once: Unified, Real-Time Object Detection <arxiv link>
@misc{redmon2016look,
title={You Only Look Once: Unified, Real-Time Object Detection},
author={Joseph Redmon and Santosh Divvala and Ross Girshick and Ali Farhadi},
year={2016},
eprint={1506.02640},
archivePrefix={arXiv},
primaryClass={cs.CV}
}