F11 Handout
F11 Handout
Neural nets • To make sure that y1 , y2 are probabilities, we take g (2) (x) as
(2) x
the softmax function: gk (x1 , x2 ) = P2e kexl .
• A problem with the methods for image classification from last l=1
• We can represent this model as a network:
time is the need for feature selection.
• Neural networks is a class of methods that can be used to Input layer L1 Hidden layer L2 Output layer L3
two classes. x2
• Model: y2
W (1) W (2)
y2 = P(z = 1|x) = 1 f (x; ✓)
↵
for some non-linear function of the pixel values.
• Likelihood for training the model from M images: • This is a feed-forward network since information only flows
M
Y forward in the network.
`(✓) = f (xi ; ✓)zi (1 f (xi ; ✓)zi )1 zi
• The nodes in the hidden layer are called neurons.
i=1 • The functions g (1) and g (2) are called activation functions.
Nerual networks David Bolin Nerual networks David Bolin
A single-layer neural net Example for binary classification
x2
• The main idea of neural networks is that we should be able to
y2
approximate any function f (x) in this way:
x3 ↵(5)
To speed up the estimation, it is common to replace the exact A convolution layer has three stages:
gradient by a stochastic estimate:
P 1 Convolution stage: Convolve each input image with f different
• Option 1: Define G(W ) = 1s M @R
i=1 Ji @ W (l) , where Ji are linear filters, with kernels of size q ⇥ q, producing f output
independent Be(s) random variables. Thus, we are randomly images.
selecting (on average) 100s% of the images in each iteration.
2 Detector stage: Apply a non-linear function to each image.
Then
Typically the rectified linear function g(v) = max(0, v).
M M
1X @R X @R 3 Pooling stage: For each image, reduce each non-overlapping
E(G(W )) = E(Ji ) =
s @W (l) @ W (l) block of r ⇥ r pixels to one single value, by for example taking
i=1 i=1
the largest value in the block.
• Option 2: Divide the training data into m batches and Convolution layer
randomly sample one of the batches in each iteration. Input image Convolution stage Detector stage Pooling stage Output
images
• Often, a fully connected network simply has too many • One could view the convolution stage as a regular layer where
parameters: For the first single-layer network for binary most of the weights are zero: A pixel in the output image only
classification, we have pp1 + 2p1 unknown weights. depends on the q ⇥ q nearest pixels in the input image.
p = p1 = 1000 thus gives 1002000 unknown parameters! • The different nodes share parameters, since we use the same
• The problem is that we have a separate weight between each convolution kernel across the entire image.
pixel and each hidden node. • As a result, a convolution layer has f q 2 parameters, which is
much less than a corresponding fully connected layer with
• The idea of Convolutional neural networks is to reduce the
(ppl )2 parameters.
number of parameters by assuming that most of the weights
• Since pooling reduces the image size, we can in the next stage
are zero, and that the non-zero weights have a common
structure. use more filters without increasing the total number of nodes.
• Pooling makes the output less sensitive to small translations of
• A CNN assumes that the input data has a lattice structure,
the input.
like an image.
• Another variant of pooling is to take the max across different
• Consists of a special type of layers called convolution layers,
learned features. This can make the output invariant to other
which are based on filtering the image with a kernel.
things, such as rotations.
Convolutional neural networks David Bolin Convolutional neural networks David Bolin
Example of a CNN
Output
layer
L5
Comments