Introduction:
Convolutional Neural Networks
for Visual Recognition
boris [email protected]
1
Acknowledgments
This presentation is heavily based on:
– http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php
– http://deeplearning.net/reading-list/tutorials/
– http://deeplearning.net/tutorial/lenet.html
– http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
… and many other
2
Agenda
1. Course overview
2. Introduction to Deep Learning
– Classical Computer Vision vs. Deep learning
3. Introduction to Convolutional Networks
– Basic CNN Architecture
– Large Scale Image Classifications
– How deep should be Conv Nets?
– Detection and Other Visual Apps
3
Course overview
1. Introduction
– Intro to Deep Learning
– Caffe: Getting started
– CNN: network topology, layers definition
2. CNN Training
– Backward propagation
– Optimization for Deep Learning: SGD : monentum, rate
adaptation, Adagrad, SGD with Line Search, CGD
– “Regularization” (Dropout , Maxout)
4
Course overview
3. Localization and Detection
– Overfeat
– R-CNN (Regions with CNN)
4. CPU / GPU performance optimization
– CUDA
– Vtune, OpenMP, and Intel MKL (Math Kernel Library)
5
Introduction to Deep Learning
6
Buzz…
7
Deep Learning – from Research to
Technology
Deep Learning - breakthrough in
visual and speech recognition 8
Classical Computer Vision Pipeline
9
Classical Computer Vision Pipeline.
CV experts
1. Select / develop features: SURF, HoG, SIFT, RIFT,
…
2. Add on top of this Machine Learning for multi-class
recognition and train classifier
Feature Detection,
Extraction: Classification
SIFT, HoG... Recognition
Classical CV feature definition is domain-
specific and time-consuming
10
Deep Learning –based Vision Pipeline.
Deep Learning:
Build features automatically based on training data
Combine feature extraction and classification
DL experts: define NN topology and train NN
Detection,
Deep NN... Deep NN...
Classification
Recognition
Deep Learning promise:
train good feature automatically,
same method for different domain
11
Computer Vision +Deep Learning +
Machine Learning
We want to combine Deep Learning + CV + ML
Combine pre-defined features with learned features;
Use best ML methods for multi-class recognition
CV+DL+ML experts needed to build the best-in-class
CV ML
Deep AdaBoost
features
NN... …
HoG, SIFT
Combine best of Computer Vision
Deep Learning and Machine Learning
12
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG
OUTPUTS
HIDDEN
NODES
INPUTS
13
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG
Training
14
1
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG
15
1
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG
16
Deep Learning Taxonomy
Supervised:
–Convolutional NN ( LeCun)
–Recurrent Neural nets (Schmidhuber )
Unsupervised
–Deep Belief Nets / Stacked RBMs (Hinton)
–Stacked denoising autoencoders (Bengio)
–Sparse AutoEncoders ( LeCun, A. Ng, )
17
Convolutional Networks
18
Convolutional NN
Convolutional Neural Networks is extension of
traditional Multi-layer Perceptron, based on 3 ideas:
1. Local receive fields
2. Shared weights
3. Spatial / temporal sub-sampling
See LeCun paper (1998) on text recognition:
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
19
What is Convolutional NN ?
CNN - multi-layer NN architecture
– Convolutional + Non-Linear Layer
– Sub-sampling Layer
– Convolutional +Non-L inear Layer
– Fully connected layers
Supervised
Classi-
Feature Extraction
fication
20
What is Convolutional NN ?
2x2
Convolution + NL Sub-sampling Convolution + NL
21
CNN success story: ILSVRC 2012
Imagenet data base: 14 mln labeled images, 20K categories
22
ILSVRC: Classification
23
Imagenet Classifications 2012
24
ILSVRC 2012: top rankers
http://www.image-net.org/challenges/LSVRC/2012/results.html
N Error-5 Algorithm Team Authors
1 0.153 Deep Conv. Neural Univ. of Krizhevsky et al
Network Toronto
2 0.262 Features + Fisher ISI Gunji et al
Vectors + Linear
classifier
3 0.270 Features + FV + SVM OXFORD_VG Simonyan et al
G
4 0.271 SIFT + FV + PQ + SVM XRCE/INRIA Perronin et al
5 0.300 Color desc. + SVM Univ. of van de Sande et
Amsterdam al
25
Imagenet 2013: top rankers
http://www.image-net.org/challenges/LSVRC/2013/results.php
N Error-5 Algorithm Team Authors
1 0.117 Deep Convolutional Clarifi Zeiler
Neural Network
2 0.129 Deep Convolutional Nat.Univ Min LIN
Neural Networks Singapore
3 0.135 Deep Convolutional NYU Zeiler
Neural Networks Fergus
4 0.135 Deep Convolutional Andrew Howard
Neural Networks
5 0.137 Deep Convolutional Overfeat Pierre Sermanet
Neural Networks NYU et al
26
Imagenet Classifications 2013
27
Conv Net Topology
5 convolutional layers
3 fully connected layers + soft-max
650K neurons , 60 Mln weights
28
Why ConvNet should be Deep?
Rob Fergus, NIPS 2013 29
Why ConvNet should be Deep?
30
Why ConvNet should be Deep?
31
Why ConvNet should be Deep?
32
Why ConvNet should be Deep?
33
Conv Nets:
beyond Visual Classification
34
CNN applications
CNN is a big Plenty low hanging fruits
hammer
You need just a right nail! 35
Conv NN: Detection
Sermanet, CVPR 2014
36
Conv NN: Scene parsing
Farabet, PAMI 2013
37
CNN: indoor semantic labeling RGBD
Farabet, 2013
38
Conv NN: Action Detection
Taylor, ECCV 2010
39
Conv NN: Image Processing
Eigen , ICCV 2010
40
BACKUP
BUZZ
41
A lot of buzz about Deep Learning
July 2012 - Started DL lab
Nov 2012- Big improvement in Speech, OCR:
– Speech – reduce Error Rate by 25%
– OCR – reduce Error rate by 30%
2013 launched 5 DL based products
– Voice search
– Photo Wonder
– Visual search
42
A lot of buzz about Deep Learning
Microsoft On Deep Learning for Speech goto 3:00-5:10
43
A lot of buzz about Deep Learning
Why Google invest in Deep Learning
44
A lot of buzz about Deep Learning
NYU “Deep Learning” Professor LeCun Will Head
Facebook’s New Artificial Intelligence Lab, Dec 10,
2013
45