What is deep learning?
A family of techniques for learning compositional vector representations
of complex data.
CS221 / Spring 2020 / Finn & Anari 9
Review: linear predictors
w
x1
x2 f✓ (x)
x3
Output:
f✓ (x) = w · x
Parameters: ✓ = w
CS221 / Spring 2020 / Finn & Anari 11
Review: neural networks
V h1
x1 w
x2 f✓ (x)
x3
h2
Intermediate hidden units:
z 1
hj (x) = (vj · x) (z) = (1 + e )
Output:
f✓ (x) = w · h(x)
Parameters: ✓ = (V, w)
CS221 / Spring 2020 / Finn & Anari 12
Deep neural networks
1-layer neural network: x
w>
score =
2-layer neural network: x
> V
w
score = ( )
3-layer neural network: x
U V
>
w
score = ( ( ))
CS221 / Spring 2020 / Finn & Anari
... 13
Depth
x
h h0 h00 h000
f✓ (x)
Intuitions:
• Hierarchical feature representations
• Can simulate a bounded computation logic circuit (original moti-
vation from McCulloch/Pitts, 1943)
• Learn this computation (and potentially more because networks
are real-valued)
• Formal theory/understanding is still incomplete
• Some hypotheses emerging: double descent, lottery ticket hypoth-
esis
CS221 / Spring 2020 / Finn & Anari 14
[figure from Honglak Lee]
What’s learned?
CS221 / Spring 2020 / Finn & Anari 15
Review: optimization
Regression:
Loss(x, y, ✓) = (f✓ (x) y)2
Key idea: minimize training loss
1 X
TrainLoss(✓) = Loss(x, y, ✓)
|Dtrain |
(x,y)2Dtrain
min TrainLoss(✓)
✓2Rd
Algorithm: stochastic gradient descent
For t = 1, . . . , T :
For (x, y) 2 Dtrain :
✓ ✓ ⌘t r✓ Loss(x, y, ✓)
CS221 / Spring 2020 / Finn & Anari 16
Training
• Non-convex optimization
• No theoretical guarantees that it works
• Before 2000s, empirically very difficult to get working
CS221 / Spring 2020 / Finn & Anari 17
What’s di↵erent today
Computation (time/memory) Information (data)
CS221 / Spring 2020 / Finn & Anari 18
How to make it work
• More hidden units (over-parameterization)
• Adaptive step sizes (AdaGrad, Adam)
• Dropout to guard against overfitting
• Careful initialization (pre-training)
• Batch normalization
Model and optimization are tightly coupled
CS221 / Spring 2020 / Finn & Anari 19
Summary
• Deep networks learn hierarchical representations of data
• Train via SGD, use backpropagation to compute gradients
• Non-convex optimization, but works empirically given enough com-
pute and data
CS221 / Spring 2020 / Finn & Anari 20
Motivation
x
W
• Observation: images are not arbitrary vectors
• Goal: leverage spatial structure of images (translation equivari-
ance)
CS221 / Spring 2020 / Finn & Anari 22
Idea: Convolutions
CS221 / Spring 2020 / Finn & Anari 23
[figure from Andrej Karpathy]
Prior knowledge
• Local connectivity: each hidden unit operates on a local image
patch (3 instead of 7 connections per hidden unit)
• Parameter sharing: processing of each image patch is same (3
parameters instead of 3 · 5)
• Intuition: try to match a pattern in image
CS221 / Spring 2020 / Finn & Anari 24
Convolutional layers
• Instead of vector to vector, we do volume to volume
[Andrej Karpathy’s demo]
CS221 / Spring 2020 / Finn & Anari 25
[figure from Andrej Karpathy]
Max-pooling
• Intuition: test if there exists a pattern in neighborhood
• Reduce computation, prevent overfitting
CS221 / Spring 2020 / Finn & Anari 26
Example of function evaluation
[Andrej Karpathy’s demo]
CS221 / Spring 2020 / Finn & Anari 27
[Krizhevsky et al., 2012]
AlexNet
• Non-linearity: use RelU (max(z, 0)) instead of logistic
• Data augmentation: translate, horizontal reflection, vary intensity,
dropout (guard against overfitting)
• Computation: parallelize across two GPUs (6 days)
• Results on ImageNet: 16.4% error (next best was 25.8%)
CS221 / Spring 2020 / Finn & Anari 28
[He et al. 2015]
Residual networks
x 7! (W x) + x
• Key idea: make it easy to learn the iden-
tity (good inductive bias)
• Enables training 152 layer networks
• Results on ImageNet: 3.6% error
CS221 / Spring 2020 / Finn & Anari 29
Summary
• Key idea 1: locality of connections, capture spatial structure
• Key idea 2: Filters share parameters, capture translational equiv-
ariance
• Depth matters
• Applications to images, text, Go, drug design, etc.
CS221 / Spring 2020 / Finn & Anari 30