Abstract
Most existing gait recognition approaches adopt a two-step procedure: a preprocessing step to extract silhouettes or skeletons followed by recognition. In this paper, we propose an end-to-end model-based gait recognition method. Specifically, we employ a skinned multi-person linear (SMPL) model for human modeling, and estimate its parameters using a pre-trained human mesh recovery (HMR) network. As the pre-trained HMR is not recognition-oriented, we fine-tune it in an end-to-end gait recognition framework. To cope with differences between gait datasets and those used for pre-training the HMR, we introduce a reconstruction loss between the silhouette masks in the gait datasets and the rendered silhouettes from the estimated SMPL model produced by a differentiable renderer. This enables us to adapt the HMR to the gait dataset without supervision using the ground-truth joint locations. Experimental results with the OU-MVLP and CASIA-B datasets demonstrate the state-of-the-art performance of the proposed method for both gait identification and verification scenarios, a direct consequence of the explicitly disentangled pose and shape features produced by the proposed end-to-end model-based framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Five dimensions of 23 joints + one root joint sum up to \(5 \times (23 + 1) = 120\).
- 2.
While the original GaitSet paper [22] reported results including the non-enrolled probes, the results here exclude the non-enrolled probes to ensure a fair comparison.
References
Bouchrika, I., Goffredo, M., Carter, J., Nixon, M.: On using gait in forensic biometrics. J. Forensic Sci. 56, 882–889 (2011)
Iwama, H., Muramatsu, D., Makihara, Y., Yagi, Y.: Gait verification system for criminal investigation. IPSJ Trans. Comput. Vis. Appl. 5, 163–175 (2013)
Lynnerup, N., Larsen, P.: Gait as evidence. IET Biometrics 3, 47–54 (2014)
Wagg, D., Nixon, M.: On automated model-based extraction and analysis of gait. In: Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 11–16 (2004)
Yam, C., Nixon, M., Carter, J.: Automated person recognition by walking and running via model-based approaches. Pattern Recogn. 37, 1057–1072 (2004)
Bobick, A., Johnson, A.: Gait recognition using static activity-specific parameters. In: CVPR, vol. 1, pp. 423–430 (2001)
Cunado, D., Nixon, M., Carter, J.: Automatic extraction and description of human gait models for recognition purposes. Comput. Vis. Image Underst. 90, 1–41 (2003)
Yamauchi, K., Bhanu, B., Saito, H.: 3D human body modeling using range data. In: ICPR, pp. 3476–3479 (2010)
Ariyanto, G., Nixon, M.: Marionette mass-spring model for 3d gait biometrics. In: Proceedings of the 5th IAPR International Conference on Biometrics, pp. 354–359 (2012)
Feng, Y., Li, Y., Luo, J.: Learning effective gait features using LSTM. In: ICPR, pp. 325–330 (2016)
Liao, R., Cao, C., Garcia, E.B., Yu, S., Huang, Y.: Pose-based temporal-spatial network (PTSN) for gait recognition with carrying and clothing variations. In: Zhou, J., et al. (eds.) CCBR 2017. LNCS, vol. 10568, pp. 474–483. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69923-3_51
Liao, R., Yu, S., An, W., Huang, Y.: A model-based gait recognition method with body pose and human prior knowledge. Pattern Recogn. 98, 107069 (2020)
Han, J., Bhanu, B.: Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28, 316–322 (2006)
Xu, D., Yan, S., Tao, D., Zhang, L., Li, X., Zhang, H.: Human gait recognition with matrix representation. IEEE Trans. Circuits Syst. Video Technol. 16, 896–903 (2006)
Lu, J., Tan, Y.P.: Uncorrelated discriminant simplex analysis for view-invariant gait signal computing. Pattern Recogn. Lett. 31, 382–393 (2010)
Guan, Y., Li, C.T., Roli, F.: On reducing the effect of covariate factors in gait recognition: a classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1521–1528 (2015)
Makihara, Y., Suzuki, A., Muramatsu, D., Li, X., Yagi, Y.: Joint intensity and spatial metric learning for robust gait recognition. In: CVPR, pp. 5705–5715 (2017)
Shiraga, K., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: Geinet: View-invariant gait recognition using a convolutional neural network. In: ICB (2016)
Wu, Z., Huang, Y., Wang, L., Wang, X., Tan, T.: A comprehensive study on cross-view gait based human identification with deep CNNs. IEEE Trans. Pattern Anal. Mach. Intell. 39, 209–226 (2017)
Takemura, N., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: On input/output architectures for convolutional neural network-based cross-view gait recognition. IEEE Trans. Circuits Syst. Video Technol. 29, 2708–2719 (2019)
Zhang, K., Luo, W., Ma, L., Liu, W., Li, H.: Learning joint gait representation via quintuplet loss minimization. In: CVPR (2019)
Chao, H., He, Y., Zhang, J., Feng, J.: Gaitset: regarding gait as a set for cross-view gait recognition. In: AAAI (2019)
Li, X., Makihara, Y., Xu, C., Yagi, Y., Ren, M.: Joint intensity transformer network for gait recognition robust against clothing and carrying status. IEEE Trans. Inf. Forensics Secur. 1 (2019)
Kusakunniran, W., Wu, Q., Zhang, J., Li, H.: Support vector regression for multi-view gait recognition based on local motion feature selection. In: CVPR, San Francisco, CA, USA, pp. 1–8 (2010)
Makihara, Y., Sagawa, R., Mukaigawa, Y., Echigo, T., Yagi, Y.: Gait recognition using a view transformation model in the frequency domain. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 151–163. Springer, Heidelberg (2006). https://doi.org/10.1007/11744078_12
Makihara, Y., Tsuji, A., Yagi, Y.: Silhouette transformation based on walking speed for gait identification. In: CVPR, San Francisco, CA, USA (2010)
Muramatsu, D., Shiraishi, A., Makihara, Y., Uddin, M., Yagi, Y.: Gait-based person recognition using arbitrary view transformation model. IEEE Trans. Image Process. 24, 140–154 (2015)
Mansur, A., Makihara, Y., Aqmar, R., Yagi, Y.: Gait recognition under speed transition. In: CVPR, pp. 2521–2528 (2014)
Akae, N., Mansur, A., Makihara, Y., Yagi, Y.: Video from nearly still: an application to low frame-rate gait recognition. In: CVPR, Providence, RI, USA, pp. 1537–1543 (2012)
Yu, S., et al.: GaiTGANv 2: invariant gait feature extraction using generative adversarial networks. Pattern Recogn. 87, 179–189 (2019)
He, Y., Zhang, J., Shan, H., Wang, L.: Multi-task GANs for view-specific feature learning in gait recognition. IEEE Trans. Inf. Forensics Secur. 14, 102–113 (2019)
Wang, C., Zhang, J., Wang, L., Pu, J., Yuan, X.: Human identification using temporal information preserving gait template. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2164–2176 (2012)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018)
Lin, G., Milan, A., Shen, C., Reid, I.D.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: CVPR, pp. 5168–5177 (2017)
Song, C., Huang, Y., Huang, Y., Jia, N., Wang, L.: GaitNet: an end-to-end network for gait based human identification. Pattern Recogn. 96, 106988 (2019)
Zhang, Z., et al.: Gait recognition via disentangled representation learning. In: CVPR, Long Beach, CA (2019)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)
Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR (2018)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34, 248:1–248:16 (2015)
Yu, S., Tan, D., Tan, T.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: ICPR, Hong Kong, China, vol. 4, pp. 441–444 (2006)
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV (2015)
Fan, C., et al.: Gaitpart: temporal part-based model for gait recognition. In: CVPR (2020)
Tran, L., Yin, X., Liu, X.: Disentangled representation learning GAN for pose-invariant face recognition. In: CVPR (2017)
Esser, P., Sutter, E., Ommer, B.: A variational U-net for conditional appearance and shape generation. In: CVPR (2018)
Li, X., Makihara, Y., Xu, C., Yagi, Y., Ren, M.: Gait recognition via semi-supervised disentangled representation learning to identity and covariate features. In: CVPR (2020)
Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: ICCV (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: New benchmark and state of the art analysis. In: CVPR (2014)
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR (2011)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2014)
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: Fifth International Conference on 3D Vision (3DV) (2017)
Hiroharu Kato, Y.U., Harada, T.: Neural 3D mesh renderer. In: CVPR (2018)
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: CVPR (2014)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, vol. 2, pp. 1735–1742 (2006)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014)
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Otsu, N.: Optimal linear and nonlinear solutions for least-square discriminant feature extraction. In: ICPR, pp. 557–560 (1982)
Xu, C., Makihara, Y., Li, X., Yagi, Y., Lu, J.: Cross-view gait recognition using pairwise spatial transformer networks. IEEE Trans. Circuits Syst. Video Technol. 1 (2020)
Hu, M., Wang, Y., Zhang, Z., Little, J.J., Huang, D.: View-invariant discriminative projection for multi-view gait-based human identification. IEEE Trans. Inf. Forensics Secur. 8, 2034–2045 (2013)
Acknowledgement
This work was supported by JSPS KAKENHI Grant No. JP18H04115, JP19H05692, and JP20H00607, and the National Natural Science Foundation of China (Grant No. 61727802).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, X., Makihara, Y., Xu, C., Yagi, Y., Yu, S., Ren, M. (2021). End-to-End Model-Based Gait Recognition. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12624. Springer, Cham. https://doi.org/10.1007/978-3-030-69535-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-69535-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69534-7
Online ISBN: 978-3-030-69535-4
eBook Packages: Computer ScienceComputer Science (R0)