0% found this document useful (0 votes)

20 views5 pages

F11 Handout

Uploaded by

AliBenardjouna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views5 pages

F11 Handout

Uploaded by

AliBenardjouna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

A single-layer neural net

• The idea of neural nets is to approximate f (x) as a sequence

Lecture 11: Neural networks of “ ‘simple” non-linear functions.
Spatial Statistics and Image Analysis • Let’s look at a single-layer model first.
• Start by forming p1 diﬀerent linear combinations of the data:
(1) (1)
W1 · x, W2 · x, . . . , W(1)
p1 · x
(1) (1) (1) P (1)
where Wk are weights and Wk · x = wk0 + pi=1 wki xi .
• To each linear combination, apply a non-linear function g (1) :
David Bolin (1) (1)
University of Gothenburg ↵1 = g (1) (W1 ·x), ↵2 = g (1) (W2 ·x), . . . , ↵p1 = g (1) (W(1)
p1 ·x)

Gothenburg • Finally approximate y1 and y2 as a transformed linear

May 13, 2019 combinations of these values:
(2) (2) (2) (2) (2) (2)
y1 = g1 (W1 · ↵, W2 · ↵), y2 = g2 (W1 · ↵, W2 · ↵)

Nerual networks David Bolin

Neural nets • To make sure that y1 , y2 are probabilities, we take g (2) (x) as
(2) x
the softmax function: gk (x1 , x2 ) = P2e kexl .
• A problem with the methods for image classification from last l=1
• We can represent this model as a network:
time is the need for feature selection.
• Neural networks is a class of methods that can be used to Input layer L1 Hidden layer L2 Output layer L3

design classifiers without the need to select features.

• Let us start with the binary classification problem: We have an x1

image x with pixels x1 , . . . , xp , which can belong to one of y1

two classes. x2
• Model: y2

y1 = P(z = 0|x) = f (x; ✓), x3

W (1) W (2)
y2 = P(z = 1|x) = 1 f (x; ✓)
↵
for some non-linear function of the pixel values.
• Likelihood for training the model from M images: • This is a feed-forward network since information only flows
M
Y forward in the network.
`(✓) = f (xi ; ✓)zi (1 f (xi ; ✓)zi )1 zi
• The nodes in the hidden layer are called neurons.
i=1 • The functions g (1) and g (2) are called activation functions.
Nerual networks David Bolin Nerual networks David Bolin
A single-layer neural net Example for binary classification

• Our model for y1 = f (x) is thus

0 1 Input layer L1 Hidden layer L2 Hidden layer L3 Hidden layer L4 Output layer L5
p
X p
X
e z1 (2) (2) (1) (1)
f (x) = , zk = wk0 + wki g (1) @wi0 + wij xi A
e z1 + e z2
i=1 j=1
x1

where all the weights w should be estimated to give a good fit. y1

x2
• The main idea of neural networks is that we should be able to
y2
approximate any function f (x) in this way:
x3 ↵(5)

The universal approximation theorem

(1) (1)
↵ W
W (4)
↵(2) W (2)
A feed-forward network with a single hidden layer containing a
↵(3) W (3) ↵(4)
finite number of neurons can approximate continuous functions on
compact subsets of Rn , under mild assumptions on the activation
function.

Nerual networks David Bolin Nerual networks David Bolin

General feed-forward neural nets for classification Comments

• Input data x1 , . . . , xp . Output: probabilities for K classes.
• The output probabilities are given by ↵(L) .
• In total L 1 hidden layers in the model.
• Common activation functions for the internal layers:
• We can allow for a non-linear transformation of the input data • Rectified linear: g(v) = max(0, v). Sometimes called Rectified
(1)
in the Input layer, giving ↵k = g (0) (xk ). Linear Unit (RELU).
• Usually we set g (0) to the identity function but keep the • Sigmoid function: g(v) = 1+e1 v . Sometimes called a radial
(1) basis function (RBF network).
notation ↵k = xk to simplify the formulas. • g(v) = tanh(v).
• At layer k in the model, define linear combination of the • Common activation function for the output layer for
neurons in previous layers, and new neuron values: classification:
pl • Softmax gi (v1 , . . . , vK ) = PKexp(vi ) .
(l) (l 1)
X1 (l 1) (l 1) k=1 exp(vk )
zl = wk0 + wkj ↵j := W (l 1) (l 1)
↵ • Symmetric version of the logit link used for logistic regression.
j=1 • The neural network is nothing else than a hierarchically
(l) (l) (l)
↵ = g (z ) specified non-linear regression. Compare with logistic
regression.
for l = 1, . . . , L, where p1 = p, and ↵(1) = x.
Nerual networks David Bolin Nerual networks David Bolin
Parameter estimation Backpropagation
The gradient of L can be estimated using the chain rule.
• The neural network defines a nonlinear function f (x, W ) of
(l)
the input variables x, depending on the unknown weights 1 Feed forward pass: Compute ↵k for each layer l and each
W = {W (1) , W (2) , . . . , W (L) }. node k based on the current estimate of W .
• To estimate W , for some input data {xi , yi }M 2 For the output layer, compute
i=1 , we can define
a loss-function R(y, f (x, W )) and compute (L)
(L) @R @ R @ ↵k @R (L)
k = (L)
= (L) (L)
= (L)
ġ (L) (zk )
M
X @ zk @ ↵k @ zk @ ↵k
Ŵ = arg min R(yi , f (xi , W ))
W i=1
3 For l = L 1, . . . , 2, compute
Simple examples of L: 0 1
pl+1
• For regression: Squared loss (l)
X (l) (l+1) A (l) (l)
@
R(y, f (x, W )) = 12 ky f (x, W )k2 . k = wkj j ġ (zk )
• For classification: Cross-entropy loss j=1
PK
R(y, f (x, W )) = k=1 1(y = k) log fk (x, W ).
@R (l) (l+1)
• Estimate W using gradient-descent. 4 Compute (l) = ↵j k
@ wkj

Parameter estimation David Bolin Parameter estimation David Bolin

Regularization Gradient descent

• Neural networks in general have too many parameters and will
overfit the data.
(l)
• An early solution to this problem was to stop the • Update wkj using a gradient descent step. Assuming
gradient-based estimation before convergence. weight-decay penalty:
• A validation dataset can be used to determine when to stop. 0 1
• A more explicit method for regularization is to include a (l) (l) @ R (l)
wkj wkj @ + wkj A
penalty on the weights in the loss-function: (l)
@ wkj
• The neural network defines a nonlinear function f (x, W ) of
the input variables x, depending on the unknown weights • We need a lot of data to estimate these models, and for large
M
X datasets, the computation of @ R(l) is expensive: For M
Ŵ = arg min R(yi , f (xi , W )) + J(W ) @ wkj
W i=1 training images with p pixels, and a network with N hidden
units, O(pM N ) operations are needed.
• A common example is the weight-decay penalty:
P (l)
J(W ) = j,l,k (wk,j )2 which will pull the weights to zero.
• is a tuning parameter: estimate using cross-validation.
Parameter estimation David Bolin Parameter estimation David Bolin
Stochastic gradient descent Convolution layers

To speed up the estimation, it is common to replace the exact A convolution layer has three stages:
gradient by a stochastic estimate:
P 1 Convolution stage: Convolve each input image with f diﬀerent
• Option 1: Define G(W ) = 1s M @R
i=1 Ji @ W (l) , where Ji are linear filters, with kernels of size q ⇥ q, producing f output
independent Be(s) random variables. Thus, we are randomly images.
selecting (on average) 100s% of the images in each iteration.
2 Detector stage: Apply a non-linear function to each image.
Then
Typically the rectified linear function g(v) = max(0, v).
M M
1X @R X @R 3 Pooling stage: For each image, reduce each non-overlapping
E(G(W )) = E(Ji ) =
s @W (l) @ W (l) block of r ⇥ r pixels to one single value, by for example taking
i=1 i=1
the largest value in the block.

• Option 2: Divide the training data into m batches and Convolution layer

randomly sample one of the batches in each iteration. Input image Convolution stage Detector stage Pooling stage Output
images

There are several other tricks to speed up convergence, such as

momentum updates.

Parameter estimation David Bolin Convolutional neural networks David Bolin

Convolutional Neural networks Comments

• Often, a fully connected network simply has too many • One could view the convolution stage as a regular layer where
parameters: For the first single-layer network for binary most of the weights are zero: A pixel in the output image only
classification, we have pp1 + 2p1 unknown weights. depends on the q ⇥ q nearest pixels in the input image.
p = p1 = 1000 thus gives 1002000 unknown parameters! • The diﬀerent nodes share parameters, since we use the same
• The problem is that we have a separate weight between each convolution kernel across the entire image.
pixel and each hidden node. • As a result, a convolution layer has f q 2 parameters, which is
much less than a corresponding fully connected layer with
• The idea of Convolutional neural networks is to reduce the
(ppl )2 parameters.
number of parameters by assuming that most of the weights
• Since pooling reduces the image size, we can in the next stage
are zero, and that the non-zero weights have a common
structure. use more filters without increasing the total number of nodes.
• Pooling makes the output less sensitive to small translations of
• A CNN assumes that the input data has a lattice structure,
the input.
like an image.
• Another variant of pooling is to take the max across diﬀerent
• Consists of a special type of layers called convolution layers,
learned features. This can make the output invariant to other
which are based on filtering the image with a kernel.
things, such as rotations.
Convolutional neural networks David Bolin Convolutional neural networks David Bolin
Example of a CNN

Convolution Convolution Convolution

Input image Vectorize .
layer layer layer .
256 ⇥ 256 512 ⇥ 1 .
64 ⇥ 64 ⇥ 2 16 ⇥ 16 ⇥ 8 4 ⇥ 4 ⇥ 32

Output
layer
L5

• The first layer has f = 2 filters, the second has f = 4, the

third has f = 4.
• Each pooling stage uses r = 4.
• The final hidden layer is a usual fully connected layer.

Convolutional neural networks David Bolin

Comments

• A CNN is a method for image classification using filtered

images as features, but where we do not need to specify
features manually.
• Using CNNs for image classification re-popularized neural
networks around 2010, and “Deep learning” was coined as a
flashy name for using “deep” neural networks with more than
one hidden layer.
• For further details on neural networks, for example see:
1 Computer age statistical inference by Efron and Hastie
2 deeplearningbook.org
3 Matlab guides: Create simple deep learning network for
classification

Convolutional neural networks David Bolin

MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Neural Networks Unit 3
No ratings yet
Neural Networks Unit 3
93 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Deep Learning for Tech Students
No ratings yet
Deep Learning for Tech Students
25 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
55 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
68 pages
NNDL
No ratings yet
NNDL
96 pages
Neural Networks
No ratings yet
Neural Networks
68 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
04 - Neural Networks PDF
No ratings yet
04 - Neural Networks PDF
46 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Deep Learning
No ratings yet
Deep Learning
90 pages
Neural Networks & Deep Learning Lecture
No ratings yet
Neural Networks & Deep Learning Lecture
9 pages
Deep Learning for Beginners
No ratings yet
Deep Learning for Beginners
151 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Artificial Neural Networks Basics
No ratings yet
Artificial Neural Networks Basics
50 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
03 PL, Activation, BackProp, CNN
No ratings yet
03 PL, Activation, BackProp, CNN
95 pages
Neural Networks
No ratings yet
Neural Networks
12 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
NguyenTrungThinh BT3.3
No ratings yet
NguyenTrungThinh BT3.3
5 pages
Chapter 11 Neural Nets
No ratings yet
Chapter 11 Neural Nets
39 pages
Ch10 Deep Learning
No ratings yet
Ch10 Deep Learning
104 pages
Batik Classification with Deep Learning
No ratings yet
Batik Classification with Deep Learning
8 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
XGBoost Course: Supervised Learning Basics
100% (1)
XGBoost Course: Supervised Learning Basics
39 pages
Convolutional Neural Networks Guide
No ratings yet
Convolutional Neural Networks Guide
40 pages
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
No ratings yet
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
7 pages
Deep Learning Face Attributes in
No ratings yet
Deep Learning Face Attributes in
11 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
87 pages
ML Unit 04
No ratings yet
ML Unit 04
11 pages
CSE Deep Learning Seminar Report
No ratings yet
CSE Deep Learning Seminar Report
4 pages
Intro To Neural Networks
No ratings yet
Intro To Neural Networks
100 pages
Unit 1
No ratings yet
Unit 1
16 pages
CVDL Cae1
No ratings yet
CVDL Cae1
28 pages
Naïve Bayes vs C4.5 for Toddler Nutrition
No ratings yet
Naïve Bayes vs C4.5 for Toddler Nutrition
11 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Urban Land-Use Classification via ML
No ratings yet
Urban Land-Use Classification via ML
22 pages
KNN VS Kmeans
No ratings yet
KNN VS Kmeans
3 pages
Final Exam - Attempt Review
No ratings yet
Final Exam - Attempt Review
17 pages
Short Notes On Vanishing & Exploding Gradients
No ratings yet
Short Notes On Vanishing & Exploding Gradients
30 pages
Unit 5
No ratings yet
Unit 5
59 pages
Lecture 10 Slides - After
No ratings yet
Lecture 10 Slides - After
66 pages
04 - Machine Learning For Embedded and Edge AI
No ratings yet
04 - Machine Learning For Embedded and Edge AI
58 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Neural Networks Quiz Review
No ratings yet
Neural Networks Quiz Review
4 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Unit II
No ratings yet
Unit II
56 pages
Unit Iv DL
No ratings yet
Unit Iv DL
26 pages
Lecture-2 Learning Process45452465442
No ratings yet
Lecture-2 Learning Process45452465442
50 pages
Chapter 11 Neural Nets (Python)
No ratings yet
Chapter 11 Neural Nets (Python)
43 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
PCA PDF 1646672241
No ratings yet
PCA PDF 1646672241
11 pages
Neural Network Logistic Regression
No ratings yet
Neural Network Logistic Regression
10 pages
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
No ratings yet
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
8 pages
Rosen Blatt's Perceptron Model
No ratings yet
Rosen Blatt's Perceptron Model
11 pages
Neural Networks for CS Students
No ratings yet
Neural Networks for CS Students
15 pages
Unit IV Ensemble Unsupervised Learning
No ratings yet
Unit IV Ensemble Unsupervised Learning
5 pages
Training NNs
No ratings yet
Training NNs
34 pages
Evaluation of Classification Algorithms For Intrusion Detection System
No ratings yet
Evaluation of Classification Algorithms For Intrusion Detection System
14 pages
GMM 1
No ratings yet
GMM 1
3 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Analytical Review of Deep Learning Architectures in Financial Frameworks For Fraud Identification of Credit Cards
No ratings yet
Analytical Review of Deep Learning Architectures in Financial Frameworks For Fraud Identification of Credit Cards
10 pages
COMP3411 Week 3 - NN
No ratings yet
COMP3411 Week 3 - NN
70 pages
01 Neural Nets
No ratings yet
01 Neural Nets
15 pages