Hands-on
Deep Learning in Python
Imry Kissos
Deep Learning Meetup
TLV August 2015
Outline
● Problem Definition
● Training a DNN
● Improving the DNN
● Open Source Packages
● Summary
2
Problem Definition
Deep
Convolution
Network
1 http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/ 3
Tutorial
● Goal: Detect facial
landmarks on (normal)
face images
● Data set provided by
Dr. Yoshua Bengio
● Tutorial code available:
https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4
Flow
Train Model Train Model Predict Points
General “Nose Tip” on Test Set
Train Model
“Mouth Corners”
5
Flow
Train Images Fit Trained
Train Points Net
6
Flow
Test Predict Predicted
Images Points
7
Python Deep Learning Framework
High Level
nolearn - Wrapper to Lasagne
Lasagne - Theano extension for Deep Learning
Theano - Define, optimize, and mathematical expressions
Efficient Cuda GPU for DNN Low Level
HW Supports: GPU & CPU
OS: Linux, OS X, Windows 8
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
9
Training a Deep Neural Network
1. Data Analysis
a. Exploration + Validation
b. Pre-Processing
c. Batch and Split
2. Architecture Engineering
3. Optimization
4. Training the DNN
10
Data Exploration + Validation 1
Data:
● 7K gray-scale images of detected faces
● 96x96 pixels per image
● 15 landmarks per image (?)
Data validation:
● Some Landmarks are missing
11
Pre-Processing
Data
Normalization
Shuffle train data
12
Batch
-
- t - train batch
⇐One Epoch’s data
- validation batch
- - test batch
train/valid/test splits are constant 13
Train / Validation Split
Classification - Train/Validation preserve classes proportion
14
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
a. Layers Definition
b. Layers Implementation
3. Optimization
4. Training
15
Architecture
XY
Conv Pool Dense Output
16
Layers Definition
17
Activation Function 1
ReLU
18
Dense Layer
19
Dropout
20
Dropout
21
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
a. Back Propagation
b. Objective
c. SGD
d. Updates
e. Convergence Tuning
4. Training the DNN 22
Back Propagation
Forward Path
XY
Output
Conv Dense Points
23
Back Propagation
Forward Path
XY XY
Output Training
Conv Dense Points Points
24
Back Propagation
Backward Path
XY
Conv Dense
25
Back Propagation
Update
For All Layers:
Conv Dense
26
Objective
27
S.G.D Updates the network after each batch
Karpathy - “Babysitting”: weights/updates ~1e3 28
Optimization - Updates
29
Alec Radford
Adjusting Learning Rate & Momentum
Linear in epoch
30
Convergence Tuning
stops according to validation loss
returns best weights
31
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
a. Fit
b. Fine Tune Pre-Trained
c. Learning Curves
32
Fit
Loop over train batchs
Forward+BackProp
Loop over validation batchs
Forward
33
Fine Tune Pre-Trained
fgd
change output layer
load pre-trained weight
fine tune specialist
34
Learning Curves
Loop over 6 Nets:
Epochs
35
Learning Curves Analysis
Net 1
Net 2
RMSE
RMSE
Epochs Epochs
Convergence Overfitting
Jittering 36
Part 1 Summary
Training a DNN:
37
Part 1 End
Break
Part 2
Beyond Training
Outline
● Problem Definition
● Motivation
● Training a DNN
● Improving the DNN
● Open Source Packages
● Summary
40
Beyond Training
1. Improving the DNN
a. Analysis Capabilities
b. Augmentation
c. Forward - Backward Path
d. Monitor Layers’ Training
2. Open Source Packages
3. Summary
41
Improving the DNN
Very tempting:
● >1M images
● >1M parameters
● Large gap: Theory ↔ Practice
⇒Brute force experiments?!
42
Analysis Capabilities
1. Theoretical explanation
a. Eg. dropout and augmentation decrease overfit
2. Empirical claims about a phenomena
a. Eg. normalization improves convergence
3. Numerical understanding
a. Eg. exploding / vanishing updates
43
Reduce Overfitting Net 1
Net 2
Solution:
Data Augmentation
Epochs
Overfitting
44
Data Augmentation
Horizontal Flip Perturbation
45
Advanced Augmentation
http://benanne.github.io/2015/03/17/plankton.html 46
Convergence Challenges
RMSE
Epochs Epochs
Normalization Data Error
Need to monitor forward + backward path
47
Forward - Backward Path
Forward
Backward:
Gradient w.r.t parameters
48
Monitor Layers’ Training
nolearn - visualize.py
49
Monitor Layers’ Training
X. Glorot ,Y. Bengio, Understanding the difficulty of training deep feedforward neural networks:
“Monitoring activation and gradients across layers and training
iterations is a powerful investigation tool”
Easy to monitor in Theano Framework
50
Weight Initialization matters (1)
Layer 1- Gradient are close to zero - vanishing gradients
51
Weight Initialization matters (2)
Network returns close to zero values for all inputs
52
Monitoring Activation
plateaus sometimes seen when training neural
networks
For most epochs the network returns close to zero output for all inputs
Objective plateaus sometimes can be explained by saturation 53
Monitoring weights/update ratio
3e-1
Max of Weights of Conv1: 2e-1
1e-1
0 Epoch
3e-3
Max of Updates of Conv1:
2e-3
1e-3
0 Epoch
http://cs231n.github.io/neural-networks-3/#baby 54
Beyond Training
1. Improving the DNN
2. Open Source Packages
a. Hardware and OS
b. Python Framework
c. Deep Learning Open Source Packages
d. Effort Estimation
3. Summary
55
Hardware and OS
● Amazon Cloud GPU:
AWS Lasagne GPU Setup
Spot ~ $0.0031 per GPU Instance Hour
● IBM Cloud GPU:
http://www-03.ibm.com/systems/platformcomputing/products/symphony/gpuharvesting.html
● Your Linux machine GPU:
pip install -r https://raw.githubusercontent.com/dnouri/kfkd-
tutorial/master/requirements.txt
● Window install
http://deeplearning.net/software/theano/install_windows.html#install-windows
56
Starting Tips
● Sanity Checks:
○ DNN Architecture : “Overfit a tiny subset of data” Karpathy
○ Check Regularization ↗ Loss ↗
● Use pre-trained VGG as a base line
● Start with ~3 conv layer with ~16 filter each - quickly iterate
57
Python
● Rich eco-system
● State-of-the-art
● Easy to port from prototype to production
Podcast : http://www.reversim.com/2015/10/277-scientific-python.html
58
Python Deep
Learning Framework
Keras ,pylearn2, OpenDeep, Lasagne - common base 59
Tips from Deep Learning Packages
Torch code organization Caffe’s separation
configuration ↔code
NeuralNet → YAML text format
defining experiment’s configuration
60
Deep Learning
Open Source Packages
Open source progress rapidly→ impossible to predict industry’s standard
Caffe for applications
Torch and Theano for research on Deep Learning itself
http://fastml.com/torch-vs-theano/
White Box Black Box
61
Disruptive Effort Estimation
Feature Eng Deep Learning
Still requires algorithmic expertise 62
Summary
● Dove into Training a DNN
● Presented Analysis Capabilities
● Reviewed Open Source Packages
63
References
Hinton Coursera Neuronal Network
https://www.coursera.org/course/neuralnets
Technion Deep Learning course
http://moodle.technion.ac.il/course/view.php?id=4128
Oxford Deep Learning course
https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu
CS231n CNN for Visual Recognition
http://cs231n.github.io/
Deep Learning Book
http://www.iro.umontreal.ca/~bengioy/dlbook/
Montreal DL summer school
http://videolectures.net/deeplearning2015_montreal/
64
Questions?
Deep
Convolution
Regression
Network
65