Logistic Multilayer
Regression Perceptron
σ(yi
σ(zi )
) yi
zi
σ(ζ σ(ζ
) )
ζi1 iJ
bM i1
ζ
iJ
b1
σ(zi1) σ(ziK)
zi1 ziK
features of
xi1 xi2 xiM
data
xi1 xi2 features of xiM
data
Complex
Relationships Using
Deep Learning
Can be captured by using
deep neural networks
Can be represented
accurately and predicted
x2 well
Can give perfect
performance in the
training set
Can perform poorly in
the real world
Needs to be validated
x1
Overfitting is when the learned
model increases complexity to fi t
the observed training data too
well
Will not work on future data
in the real world
observation
4
Want to come up with
3
function to predict
observation given x
2
(x)
f
1
0
0 1 2 3 4
x
Increasing Polynomial Order
observatio 1–order observatio 3–order observatio 8–order
4 n fit 4 n fit 4 n fit
3 3 3
(x)
(x)
(x)
2 2 2
f
f
1 1 1
0 0 0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
x x x
Multilayer
Problems with
Perceptron
σ(yi
Overfitting
)
Increasing
yi parameters
σ(ζ σ(ζ increases error rate
) )
i1 iJ Complex relationship may
ζ ζ be too complex for reality
i1 iJ
σ(zi1) σ(ziK)
zi1 ziK
xi1 xi2 features of xiM
data
Training
Problems with
x1 Set y1 Overfitting
x2 y2 Increasing
parameters
x3 y3 increases error rate
x4 y4 Complex relationship may
be too complex for reality
Models and analysis
xN – 1 yN – 1 are not generalized
xN yN
(b0, b1,… bM)
Training
Problems with
x1 Set y1 Overfitting
x2 y2 Increasing
parameters
x3 y3 increases error rate
x4 y4 Complex relationship may
be too complex for reality
Models and analysis
xN – 1 yN – 1 are not generalized
xN yN
(b0, b1,… bM)
how well will this work in the
real world?
Standard Validation Strategy
Training Set New Real-World
x1 Data y1 x1 y1
x2 y2 x2 y2
x3 y3 x3 y3
x4 y4 x4 y4
xN – 1 yN – 1 xN – 1 yN – 1
xN yN xN yN
(b0, b1,… bM)
estimate real-world
performance
Standard Validation Strategy
Is costly, can we use existing
data to estimate performance?
Split Data in Separate Groups
x1 y1 x1 y11
x2 y2 xx2 yy2
2 2
x3 y3 x3 y3
x3 y3
x4 y4 random x4 y4
assignme x4 y4
nt
xN – 1 yN – 1 xN – 1 yN – 1
x– N1 yyN –
xN yN
all available xN trainin validatio testinyN
data g n g
Split Data in Separate Groups
x1 y1
x2 y2
x3 y3
x1 y1 x4 y4
x2 y2
yN – 1
testin
x3 y3 xN – 1
yN
x1 g y1
xN y2
x4 y4 x2
trainin x3 y3
g x4 y4
validatio
x1 n y1 x yN – 1
xN – 1 y N – 1 x2 N–1
y2 x N yN
x3
xN yN x4 y3
y4
all available
data yN – 1
xN – 1
yN
xN
Test
x1 y1 Set
Standard practice
x2 y2 in machine
x3 y3 learning
Created prior to any
x4 y4 analysis
Will never be used to
learn or fi t any
parameters
xN – 1 yN – 1 Can evaluate
performance of
xN yN network on test set
Analogous to
running a new
experiment
Test
x1 y1 Set
x2 y2 Should ideally only be used
once
x3 y3
Reusing a test set will lead to
x4 y4 bias
Bias results will lead to
optimistic performance
estimates
xN – 1 yN – 1
xN yN
Validation
x1 y1 Set
Can be used to
x2 y2 compare which
x3 y3 approach is best
Not used to learn
x4 y4 parameters
Used repeatedly to estimate
the performance of a model
Can be used to pick out
xN – 1 yN – 1 the best performance
model
xN yN
trainin
x1 g y1
x2 y2 refine
model
x3 y3
x4 y4 validatio testin
x1 n y1 x1 g y1
x2 y2 x2 y2
xN – 1 x3 y3 x3 y3
yN – 1
xN x4 y4 x4 y4
yN
yN – 1 xN – 1 yN – 1
xN – 1
yN xN yN
xN
(b0, b1,… bM)
estimate performance on final performance
validation set evaluation