This repo contains PyTorch implementations of deep person re-identification models.
We support
- multi-GPU training.
- both image-based and video-based reid.
- unified interface for different reid models.
- end-to-end training and evaluation.
- standard splits used by most papers.
- download of trained models.
- May 2018: Support MSMT17 and DukeMTMC-VideoReID; Added Inception-v4, Inception-ResNet-v2, DPN, ResNext and SE-ResNe(X)t. (trained models coming up later)
- Apr 2018: Added DukeMTMC-reID; Added SqueezeNet, MobileNetV2 (CVPR'18), ShuffleNet (CVPR'18) and Xception (CVPR'17).
- Apr 2018: Added Harmonious Attention CNN (CVPR'18). We achieved Rank-1 42.4% (vs. 41.7% in the paper) on CUHK03 (Detected) by training from scratch. The result can be reproduced by
python train_img_model_xent.py -d cuhk03 -a hacnn --save-dir log/hacnn-xent-cuhk03 --height 160 --width 64 --max-epoch 500 --stepsize -1 --eval-step 50. - Apr 2018: Code upgraded to pytorch 0.4.0.
- Apr 2018: Added CUHK03. Models are available.
- Apr 2018: Added iLIDS-VID and PRID-2011. Models are available.
- Mar 2018: Added argument
--htri-onlytotrain_img_model_xent_htri.pyandtrain_vid_model_xent_htri.py. If this argument is true, onlyhtri[4] is used for training. See here for detailed changes. - Mar 2018: Added Multi-scale Deep CNN (ICCV'17) [10] with slight modifications: (a) Input size is (256, 128) instead of (160, 60); (b) We add an average pooling layer after the last conv feature maps. (c) We train the network with our strategy. Model trained from scratch on Market1501 is available.
- Mar 2018: Added center loss (ECCV'16) [9] and the trained model weights.
- PyTorch (0.4.0)
- torchvision (0.2.1)
Python2 is recommended for current version.
cdto the folder where you want to download this repo.- run
git clone https://github.com/KaiyangZhou/deep-person-reid.
Create a directory to store reid datasets under this repo via
cd deep-person-reid/
mkdir data/If you wanna store datasets in another directory, you need to specify --root path_to_your/data when running the training code. Please follow the instructions below to prepare each dataset. After that, you can simply do -d the_dataset when running the training code.
Please do not call image dataset when running video reid scripts, otherwise error would occur, and vice versa.
Market1501 [7]:
- Download dataset to
data/from http://www.liangzheng.org/Project/project_reid.html. - Extract dataset and rename to
market1501. The data structure would look like:
market1501/
bounding_box_test/
bounding_box_train/
...
- Use
-d market1501when running the training code.
CUHK03 [13]:
- Create a folder named
cuhk03/underdata/. - Download dataset to
data/cuhk03/from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and extractcuhk03_release.zip, so you will havedata/cuhk03/cuhk03_release. - Download new split [14] from person-re-ranking. What you need are
cuhk03_new_protocol_config_detected.matandcuhk03_new_protocol_config_labeled.mat. Put these two mat files underdata/cuhk03. Finally, the data structure would look like
cuhk03/
cuhk03_release/
cuhk03_new_protocol_config_detected.mat
cuhk03_new_protocol_config_labeled.mat
...
- Use
-d cuhk03when running the training code. In default mode, we use new split (767/700). If you wanna use the original splits (1367/100) created by [13], specify--cuhk03-classic-split. As [13] computes CMC differently from Market1501, you might need to specify--use-metric-cuhk03for fair comparison with their method. In addition, we support bothlabeledanddetectedmodes. The default mode loadsdetectedimages. Specify--cuhk03-labeledif you wanna train and test onlabeledimages.
DukeMTMC-reID [16, 17]:
- Create a directory under
data/calleddukemtmc-reid. - Download dataset
DukeMTMC-reID.zipfrom https://github.com/layumi/DukeMTMC-reID_evaluation#download-dataset and put it todata/dukemtmc-reid. Extract the zip file, which leads to
dukemtmc-reid/
DukeMTMC-reid.zip # (you can delete this zip file, it is ok)
DukeMTMC-reid/ # this folder contains 8 files.
- Use
-d dukemtmcreidwhen running the training code.
MSMT17 [22]:
- Create a directory named
msmt17/underdata/. - Download dataset
MSMT17_V1.tar.gztodata/msmt17/from http://www.pkuvmc.com/publications/msmt17.html. Extract the file under the same folder, so you will have
msmt17/
MSMT17_V1.tar.gz # (do whatever you want with this .tar file)
MSMT17_V1/
train/
test/
list_train.txt
... (totally six .txt files)
- Use
-d msmt17when running the training code.
MARS [8]:
- Create a directory named
mars/underdata/. - Download dataset to
data/mars/from http://www.liangzheng.com.cn/Project/project_mars.html. - Extract
bbox_train.zipandbbox_test.zip. - Download split information from https://github.com/liangzheng06/MARS-evaluation/tree/master/info and put
info/indata/mars(we want to follow the standard split in [8]). The data structure would look like:
mars/
bbox_test/
bbox_train/
info/
- Use
-d marswhen running the training code.
iLIDS-VID [11]:
- The code supports automatic download and formatting. Simple use
-d ilidsvidwhen running the training code. The data structure would look like:
ilids-vid/
i-LIDS-VID/
train-test people splits/
splits.json
PRID [12]:
- Under
data/, domkdir prid2011to create a directory. - Download dataset from https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/ and extract it under
data/prid2011. - Download the split created by iLIDS-VID from here, and put it in
data/prid2011/. We follow [11] and use 178 persons whose sequences are more than a threshold so that results on this dataset can be fairly compared with other approaches. The data structure would look like:
prid2011/
splits_prid2011.json
prid_2011/
multi_shot/
single_shot/
readme.txt
- Use
-d pridwhen running the training code.
DukeMTMC-VideoReID [16, 23]:
- Make a directory
data/dukemtmc-vidreid. - Download
dukemtmc_videoReID.zipfrom https://github.com/Yu-Wu/DukeMTMC-VideoReID. Unzip the file todata/dukemtmc-vidreid. You need to have
dukemtmc-vidreid/
dukemtmc_videoReID/
train_split/
query_split/
gallery_split/
... (and two license files)
- Use
-d dukemtmcvidreidwhen running the training code.
These are implemented in dataset_loader.py where we have two main classes that subclass torch.utils.data.Dataset:
- ImageDataset: processes image-based person reid datasets.
- VideoDataset: processes video-based person reid datasets.
These two classes are used for torch.utils.data.DataLoader that can provide batched data. Data loader wich ImageDataset outputs batch data of (batch, channel, height, width), while data loader with VideoDataset outputs batch data of (batch, sequence, channel, height, width).
models/ResNet.py: ResNet50 [1], ResNet101 [1], ResNet50M [2].models/ResNeXt.py: ResNeXt101 [26].models/SEResNet.py: SEResNet50 [25], SEResNet101 [25], SEResNeXt50 [25], SEResNeXt101 [25].models/DenseNet.py: DenseNet121 [3].models/MuDeep.py: MuDeep [10].models/HACNN.py: HACNN [15].models/SqueezeNet.py: SqueezeNet [18].models/MobileNet.py: MobileNetV2 [19].models/ShuffleNet.py: ShuffleNet [20].models/Xception.py: Xception [21].models/InceptionV4.py: InceptionV4 [24].models/InceptionResNetV2.py: InceptionResNetV2 [24].models/DPN.py: DPN92 [27].
See models/__init__.py for details regarding what keys to use to call these models.
xent: cross entropy + label smoothing regularizer [5].htri: triplet loss with hard positive/negative mining [4] .cent: center loss [9].
Optimizers are wrapped in optimizers.py, which supports adam (default) and sgd. Use --optim string_name to manage the optimizer.
Training codes are implemented mainly in
train_img_model_xent.py: train image model with cross entropy loss.train_img_model_xent_htri.py: train image model with combination of cross entropy loss and hard triplet loss.train_img_model_cent.py: train image model with center loss.train_vid_model_xent.py: train video model with cross entropy loss.train_vid_model_xent_htri.py: train video model with combination of cross entropy loss and hard triplet loss.
For example, to train an image reid model using ResNet50 and cross entropy loss, run
python train_img_model_xent.py -d market1501 -a resnet50 --max-epoch 60 --train-batch 32 --test-batch 32 --stepsize 20 --eval-step 20 --save-dir log/resnet50-xent-market1501 --gpu-devices 0To use multiple GPUs, you can set --gpu-devices 0,1,2,3.
Please run python train_blah_blah.py -h for more details regarding arguments.
🐶 means that model is initialized with imagenet pretrained weights.
| Model | Param Size (M) | Loss | Rank-1/5/10 (%) | mAP (%) | Model weights | Published Rank | Published mAP |
|---|---|---|---|---|---|---|---|
| DenseNet121:dog: | 7.72 | xent | 86.5/93.6/95.7 | 67.8 | download | ||
| DenseNet121:dog: | 7.72 | xent+htri | 89.5/96.3/97.5 | 72.6 | download | ||
| ResNet50:dog: | 25.05 | xent | 85.4/94.1/95.9 | 68.8 | download | ||
| ResNet50:dog: | 25.05 | xent+htri | 87.5/95.3/97.3 | 72.3 | download | ||
| ResNet50M:dog: | 30.01 | xent | 89.4/95.9/97.4 | 75.0 | download | 89.9/-/- | 75.6 |
| ResNet50M:dog: | 30.01 | xent+htri | 90.7/97.0/98.2 | 76.8 | download | ||
| MuDeep | 138.02 | xent+htri | 71.5/89.3/96.3 | 47.0 | download | ||
| SqueezeNet | 1.13 | xent | 65.1/82.3/87.9 | 41.6 | download | ||
| MobileNetV2 | 3.19 | xent | 77.0/89.5/92.8 | 56.3 | download | ||
| ShuffleNet | 1.63 | xent | 68.7/85.7/90.2 | 44.9 | download | ||
| Xception | 22.39 | xent | 72.1/88.2/92.1 | 52.8 | download | ||
| HACNN | 3.70 | xent | 88.7/95.3/97.4 | 71.2 | download | 91.2/-/- | 75.7 |
CUHK03 (detected, new protocol (767/700))
| Model | Param Size (M) | Loss | Rank-1/5/10 (%) | mAP (%) | Model weights | Published Rank | Published mAP |
|---|---|---|---|---|---|---|---|
| DenseNet121:dog: | 7.74 | xent | 41.0/61.7/71.5 | 40.6 | download | ||
| ResNet50:dog: | 25.08 | xent | 48.8/69.4/78.4 | 47.5 | download | ||
| ResNet50M:dog: | 30.06 | xent | 57.5/75.4/82.5 | 55.2 | download | 47.1/-/- | 43.5 |
| HACNN | 3.72 | xent | 42.4/60.9/70.5 | 40.9 | download | 41.7/-/- | 38.6 |
| SqueezeNet | 1.13 | xent | 20.0/38.4/48.2 | 20.0 | download | ||
| MobileNetV2 | 3.21 | xent | 35.1/55.8/64.7 | 33.8 | download | ||
| ShuffleNet | 1.64 | xent | 22.0/39.3/49.9 | 21.2 | download |
| Model | Param Size (M) | Loss | Rank-1/5/10 (%) | mAP (%) | Model weights | Published Rank | Published mAP |
|---|---|---|---|---|---|---|---|
| DenseNet121:dog: | 7.67 | xent | 74.9/86.0/88.8 | 54.5 | download | ||
| ResNet50:dog: | 24.94 | xent | 76.3/87.1/90.9 | 59.5 | download | ||
| ResNet50M:dog: | 29.86 | xent | 80.5/89.8/92.4 | 63.3 | download | 80.4/-/- | 63.9 |
| SqueezeNet | 1.10 | xent | 50.2/68.9/75.3 | 30.3 | download | ||
| MobileNetV2 | 3.12 | xent | 65.6/79.2/83.7 | 43.6 | download | ||
| ShuffleNet | 1.58 | xent | 56.9/74.680.5 | 37.8 | download | ||
| HACNN | 3.65 | xent | 78.5/88.8/91.3 | 60.8 | download | 80.5/-/- | 63.8 |
| Model | Param Size (M) | Loss | Rank-1/5/10 (%) | mAP (%) | Model weights | Published Rank | Published mAP |
|---|---|---|---|---|---|---|---|
| DenseNet121:dog: | 7.59 | xent | 65.2/81.1/86.3 | 52.1 | download | ||
| DenseNet121:dog: | 7.59 | xent+htri | 82.6/93.2/95.4 | 74.6 | download | ||
| ResNet50:dog: | 24.79 | xent | 74.5/88.8/91.8 | 64.0 | download | ||
| ResNet50:dog: | 24.79 | xent+htri | 80.8/92.1/94.3 | 74.0 | download | ||
| ResNet50M:dog: | 29.63 | xent | 77.8/89.8/92.8 | 67.5 | - | ||
| ResNet50M:dog: | 29.63 | xent+htri | 82.3/93.8/95.3 | 75.4 | - |
Say you have downloaded ResNet50 trained with xent on market1501. The path to this model is 'saved-models/resnet50_xent_market1501.pth.tar' (create a directory to store model weights mkdir saved-models/). Then, run the following command to test
python train_img_model_xent.py -d market1501 -a resnet50 --evaluate --resume saved-models/resnet50_xent_market1501.pth.tar --save-dir log/resnet50-xent-market1501 --test-batch 32Likewise, to test video reid model, you should have a pretrained model saved under saved-models/, e.g. saved-models/resnet50_xent_mars.pth.tar, then run
python train_vid_model_xent.py -d mars -a resnet50 --evaluate --resume saved-models/resnet50_xent_mars.pth.tar --save-dir log/resnet50-xent-mars --test-batch 2Note that --test-batch in video reid represents number of tracklets. If we set this argument to 2, and sample 15 images per tracklet, the resulting number of images per batch is 2*15=30. Adjust this argument according to your GPU memory.
- How do I set different learning rates to different components in my model?
A: Instead of giving model.parameters() to optimizer, you could pass an iterable of dicts, as described here. Please see the example below
# First comment the following code.
#optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
param_groups = [
{'params': model.base.parameters(), 'lr': 0},
{'params': model.classifier.parameters()},
]
# Such that model.base will be frozen and model.classifier will be trained with
# the default leanring rate, i.e. args.lr. This example code only applies to model
# that has two components (base and classifier). Modify the code to adapt to your model.
optimizer = torch.optim.Adam(param_groups, lr=args.lr, weight_decay=args.weight_decay)Of course, you can pass model.classifier.parameters() to optimizer if you only need to train the classifier (in this case, setting the requires_grads wrt the base model params to false will be more efficient).
[1] He et al. Deep Residual Learning for Image Recognition. CVPR 2016.
[2] Yu et al. The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching. arXiv:1711.08106.
[3] Huang et al. Densely Connected Convolutional Networks. CVPR 2017.
[4] Hermans et al. In Defense of the Triplet Loss for Person Re-Identification. arXiv:1703.07737.
[5] Szegedy et al. Rethinking the Inception Architecture for Computer Vision. CVPR 2016.
[6] Kingma and Ba. Adam: A Method for Stochastic Optimization. ICLR 2015.
[7] Zheng et al. Scalable Person Re-identification: A Benchmark. ICCV 2015.
[8] Zheng et al. MARS: A Video Benchmark for Large-Scale Person Re-identification. ECCV 2016.
[9] Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016
[10] Qian et al. Multi-scale Deep Learning Architectures for Person Re-identification. ICCV 2017.
[11] Wang et al. Person Re-Identification by Video Ranking. ECCV 2014.
[12] Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.
[13] Li et al. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. CVPR 2014.
[14] Zhong et al. Re-ranking Person Re-identification with k-reciprocal Encoding. CVPR 2017
[15] Li et al. Harmonious Attention Network for Person Re-identification. CVPR 2018.
[16] Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.
[17] Zheng et al. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. ICCV 2017.
[18] Iandola et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv:1602.07360.
[19] Sandler et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. CVPR 2018.
[20] Zhang et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CVPR 2018.
[21] Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. CVPR 2017.
[22] Wei et al. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. CVPR 2018.
[23] Wu et al. Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning. CVPR 2018.
[24] Szegedy et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. ICLRW 2016.
[25] Hu et al. Squeeze-and-Excitation Networks. CVPR 2018.
[26] Xie et al.
Aggregated Residual Transformations for Deep Neural Networks. CVPR 2017.
[27] Chen et al. Dual Path Networks. NIPS 2017.