0% found this document useful (0 votes)

20 views5 pages

F11 Handout

Uploaded by

AliBenardjouna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views5 pages

F11 Handout

Uploaded by

AliBenardjouna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

A single-layer neural net

• The idea of neural nets is to approximate f (x) as a sequence

Lecture 11: Neural networks of “ ‘simple” non-linear functions.
Spatial Statistics and Image Analysis • Let’s look at a single-layer model first.
• Start by forming p1 diﬀerent linear combinations of the data:
(1) (1)
W1 · x, W2 · x, . . . , W(1)
p1 · x
(1) (1) (1) P (1)
where Wk are weights and Wk · x = wk0 + pi=1 wki xi .
• To each linear combination, apply a non-linear function g (1) :
David Bolin (1) (1)
University of Gothenburg ↵1 = g (1) (W1 ·x), ↵2 = g (1) (W2 ·x), . . . , ↵p1 = g (1) (W(1)
p1 ·x)

Gothenburg • Finally approximate y1 and y2 as a transformed linear

May 13, 2019 combinations of these values:
(2) (2) (2) (2) (2) (2)
y1 = g1 (W1 · ↵, W2 · ↵), y2 = g2 (W1 · ↵, W2 · ↵)

Nerual networks David Bolin

Neural nets • To make sure that y1 , y2 are probabilities, we take g (2) (x) as
(2) x
the softmax function: gk (x1 , x2 ) = P2e kexl .
• A problem with the methods for image classification from last l=1
• We can represent this model as a network:
time is the need for feature selection.
• Neural networks is a class of methods that can be used to Input layer L1 Hidden layer L2 Output layer L3

design classifiers without the need to select features.

• Let us start with the binary classification problem: We have an x1

image x with pixels x1 , . . . , xp , which can belong to one of y1

two classes. x2
• Model: y2

y1 = P(z = 0|x) = f (x; ✓), x3

W (1) W (2)
y2 = P(z = 1|x) = 1 f (x; ✓)
↵
for some non-linear function of the pixel values.
• Likelihood for training the model from M images: • This is a feed-forward network since information only flows
M
Y forward in the network.
`(✓) = f (xi ; ✓)zi (1 f (xi ; ✓)zi )1 zi
• The nodes in the hidden layer are called neurons.
i=1 • The functions g (1) and g (2) are called activation functions.
Nerual networks David Bolin Nerual networks David Bolin
A single-layer neural net Example for binary classification

• Our model for y1 = f (x) is thus

0 1 Input layer L1 Hidden layer L2 Hidden layer L3 Hidden layer L4 Output layer L5
p
X p
X
e z1 (2) (2) (1) (1)
f (x) = , zk = wk0 + wki g (1) @wi0 + wij xi A
e z1 + e z2
i=1 j=1
x1

where all the weights w should be estimated to give a good fit. y1

x2
• The main idea of neural networks is that we should be able to
y2
approximate any function f (x) in this way:
x3 ↵(5)

The universal approximation theorem

(1) (1)
↵ W
W (4)
↵(2) W (2)
A feed-forward network with a single hidden layer containing a
↵(3) W (3) ↵(4)
finite number of neurons can approximate continuous functions on
compact subsets of Rn , under mild assumptions on the activation
function.

Nerual networks David Bolin Nerual networks David Bolin

General feed-forward neural nets for classification Comments

• Input data x1 , . . . , xp . Output: probabilities for K classes.
• The output probabilities are given by ↵(L) .
• In total L 1 hidden layers in the model.
• Common activation functions for the internal layers:
• We can allow for a non-linear transformation of the input data • Rectified linear: g(v) = max(0, v). Sometimes called Rectified
(1)
in the Input layer, giving ↵k = g (0) (xk ). Linear Unit (RELU).
• Usually we set g (0) to the identity function but keep the • Sigmoid function: g(v) = 1+e1 v . Sometimes called a radial
(1) basis function (RBF network).
notation ↵k = xk to simplify the formulas. • g(v) = tanh(v).
• At layer k in the model, define linear combination of the • Common activation function for the output layer for
neurons in previous layers, and new neuron values: classification:
pl • Softmax gi (v1 , . . . , vK ) = PKexp(vi ) .
(l) (l 1)
X1 (l 1) (l 1) k=1 exp(vk )
zl = wk0 + wkj ↵j := W (l 1) (l 1)
↵ • Symmetric version of the logit link used for logistic regression.
j=1 • The neural network is nothing else than a hierarchically
(l) (l) (l)
↵ = g (z ) specified non-linear regression. Compare with logistic
regression.
for l = 1, . . . , L, where p1 = p, and ↵(1) = x.
Nerual networks David Bolin Nerual networks David Bolin
Parameter estimation Backpropagation
The gradient of L can be estimated using the chain rule.
• The neural network defines a nonlinear function f (x, W ) of
(l)
the input variables x, depending on the unknown weights 1 Feed forward pass: Compute ↵k for each layer l and each
W = {W (1) , W (2) , . . . , W (L) }. node k based on the current estimate of W .
• To estimate W , for some input data {xi , yi }M 2 For the output layer, compute
i=1 , we can define
a loss-function R(y, f (x, W )) and compute (L)
(L) @R @ R @ ↵k @R (L)
k = (L)
= (L) (L)
= (L)
ġ (L) (zk )
M
X @ zk @ ↵k @ zk @ ↵k
Ŵ = arg min R(yi , f (xi , W ))
W i=1
3 For l = L 1, . . . , 2, compute
Simple examples of L: 0 1
pl+1
• For regression: Squared loss (l)
X (l) (l+1) A (l) (l)
@
R(y, f (x, W )) = 12 ky f (x, W )k2 . k = wkj j ġ (zk )
• For classification: Cross-entropy loss j=1
PK
R(y, f (x, W )) = k=1 1(y = k) log fk (x, W ).
@R (l) (l+1)
• Estimate W using gradient-descent. 4 Compute (l) = ↵j k
@ wkj

Parameter estimation David Bolin Parameter estimation David Bolin

Regularization Gradient descent

• Neural networks in general have too many parameters and will
overfit the data.
(l)
• An early solution to this problem was to stop the • Update wkj using a gradient descent step. Assuming
gradient-based estimation before convergence. weight-decay penalty:
• A validation dataset can be used to determine when to stop. 0 1
• A more explicit method for regularization is to include a (l) (l) @ R (l)
wkj wkj @ + wkj A
penalty on the weights in the loss-function: (l)
@ wkj
• The neural network defines a nonlinear function f (x, W ) of
the input variables x, depending on the unknown weights • We need a lot of data to estimate these models, and for large
M
X datasets, the computation of @ R(l) is expensive: For M
Ŵ = arg min R(yi , f (xi , W )) + J(W ) @ wkj
W i=1 training images with p pixels, and a network with N hidden
units, O(pM N ) operations are needed.
• A common example is the weight-decay penalty:
P (l)
J(W ) = j,l,k (wk,j )2 which will pull the weights to zero.
• is a tuning parameter: estimate using cross-validation.
Parameter estimation David Bolin Parameter estimation David Bolin
Stochastic gradient descent Convolution layers

To speed up the estimation, it is common to replace the exact A convolution layer has three stages:
gradient by a stochastic estimate:
P 1 Convolution stage: Convolve each input image with f diﬀerent
• Option 1: Define G(W ) = 1s M @R
i=1 Ji @ W (l) , where Ji are linear filters, with kernels of size q ⇥ q, producing f output
independent Be(s) random variables. Thus, we are randomly images.
selecting (on average) 100s% of the images in each iteration.
2 Detector stage: Apply a non-linear function to each image.
Then
Typically the rectified linear function g(v) = max(0, v).
M M
1X @R X @R 3 Pooling stage: For each image, reduce each non-overlapping
E(G(W )) = E(Ji ) =
s @W (l) @ W (l) block of r ⇥ r pixels to one single value, by for example taking
i=1 i=1
the largest value in the block.

• Option 2: Divide the training data into m batches and Convolution layer

randomly sample one of the batches in each iteration. Input image Convolution stage Detector stage Pooling stage Output
images

There are several other tricks to speed up convergence, such as

momentum updates.

Parameter estimation David Bolin Convolutional neural networks David Bolin

Convolutional Neural networks Comments

• Often, a fully connected network simply has too many • One could view the convolution stage as a regular layer where
parameters: For the first single-layer network for binary most of the weights are zero: A pixel in the output image only
classification, we have pp1 + 2p1 unknown weights. depends on the q ⇥ q nearest pixels in the input image.
p = p1 = 1000 thus gives 1002000 unknown parameters! • The diﬀerent nodes share parameters, since we use the same
• The problem is that we have a separate weight between each convolution kernel across the entire image.
pixel and each hidden node. • As a result, a convolution layer has f q 2 parameters, which is
much less than a corresponding fully connected layer with
• The idea of Convolutional neural networks is to reduce the
(ppl )2 parameters.
number of parameters by assuming that most of the weights
• Since pooling reduces the image size, we can in the next stage
are zero, and that the non-zero weights have a common
structure. use more filters without increasing the total number of nodes.
• Pooling makes the output less sensitive to small translations of
• A CNN assumes that the input data has a lattice structure,
the input.
like an image.
• Another variant of pooling is to take the max across diﬀerent
• Consists of a special type of layers called convolution layers,
learned features. This can make the output invariant to other
which are based on filtering the image with a kernel.
things, such as rotations.
Convolutional neural networks David Bolin Convolutional neural networks David Bolin
Example of a CNN

Convolution Convolution Convolution

Input image Vectorize .
layer layer layer .
256 ⇥ 256 512 ⇥ 1 .
64 ⇥ 64 ⇥ 2 16 ⇥ 16 ⇥ 8 4 ⇥ 4 ⇥ 32

Output
layer
L5

• The first layer has f = 2 filters, the second has f = 4, the

third has f = 4.
• Each pooling stage uses r = 4.
• The final hidden layer is a usual fully connected layer.

Convolutional neural networks David Bolin

Comments

• A CNN is a method for image classification using filtered

images as features, but where we do not need to specify
features manually.
• Using CNNs for image classification re-popularized neural
networks around 2010, and “Deep learning” was coined as a
flashy name for using “deep” neural networks with more than
one hidden layer.
• For further details on neural networks, for example see:
1 Computer age statistical inference by Efron and Hastie
2 deeplearningbook.org
3 Matlab guides: Create simple deep learning network for
classification

Convolutional neural networks David Bolin

COMP3411 Week 3 - NN
No ratings yet
COMP3411 Week 3 - NN
70 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
Neural Networks Unit 3
No ratings yet
Neural Networks Unit 3
93 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Introduction to Statistics and Biostatistics
100% (1)
Introduction to Statistics and Biostatistics
87 pages
01 Neural Nets
No ratings yet
01 Neural Nets
15 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
Chapter 11 Neural Nets (Python)
No ratings yet
Chapter 11 Neural Nets (Python)
43 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
04 - Machine Learning For Embedded and Edge AI
No ratings yet
04 - Machine Learning For Embedded and Edge AI
58 pages
Ch10 Deep Learning
No ratings yet
Ch10 Deep Learning
104 pages
Lecture 10 Slides - After
No ratings yet
Lecture 10 Slides - After
66 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
87 pages
Alzebra
No ratings yet
Alzebra
3 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Exam MATH 201 2021
No ratings yet
Exam MATH 201 2021
4 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Unit 5
No ratings yet
Unit 5
59 pages
Lecture-2 Learning Process45452465442
No ratings yet
Lecture-2 Learning Process45452465442
50 pages
DAA PPT DAA PPT by Dr. Preeti Bailke
No ratings yet
DAA PPT DAA PPT by Dr. Preeti Bailke
21 pages
Chapter 11 Neural Nets
No ratings yet
Chapter 11 Neural Nets
39 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
03 PL, Activation, BackProp, CNN
No ratings yet
03 PL, Activation, BackProp, CNN
95 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
79 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
ML Unit 04
No ratings yet
ML Unit 04
11 pages
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
No ratings yet
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
7 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Ranked Positional Weight Method: of Assembly Line Balancing
No ratings yet
Ranked Positional Weight Method: of Assembly Line Balancing
11 pages
Artificial Neural Networks Basics
No ratings yet
Artificial Neural Networks Basics
50 pages
Unit 1
No ratings yet
Unit 1
16 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Deep Learning
No ratings yet
Deep Learning
90 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
68 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Neural Networks
No ratings yet
Neural Networks
68 pages
Neural Networks
No ratings yet
Neural Networks
12 pages
NNDL
No ratings yet
NNDL
96 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
04 - Neural Networks PDF
No ratings yet
04 - Neural Networks PDF
46 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Chapter 18 Operations Management
No ratings yet
Chapter 18 Operations Management
4 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
55 pages
Neural Networks & Deep Learning Lecture
No ratings yet
Neural Networks & Deep Learning Lecture
9 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
2.02 The Postulates of Quantum Mechanics
No ratings yet
2.02 The Postulates of Quantum Mechanics
4 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
Smart Meter Data for Load Forecasting
No ratings yet
Smart Meter Data for Load Forecasting
22 pages
Discrete Mathematics For CS (2015)
No ratings yet
Discrete Mathematics For CS (2015)
2 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Unit2 Maths Newly Edited
No ratings yet
Unit2 Maths Newly Edited
29 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Time Value of Money Solutions
No ratings yet
Time Value of Money Solutions
4 pages
Data Analysis
No ratings yet
Data Analysis
5 pages
Artificial Intelligence by Rajdeep
No ratings yet
Artificial Intelligence by Rajdeep
43 pages
Data Structures Notes 155 177
No ratings yet
Data Structures Notes 155 177
23 pages
PCM: A Guide for Telecom Engineers
No ratings yet
PCM: A Guide for Telecom Engineers
2 pages
Bai Tap (C6)
No ratings yet
Bai Tap (C6)
5 pages
Optimal Monetary Policy - Lecture Notes
No ratings yet
Optimal Monetary Policy - Lecture Notes
6 pages
Unit 2 - Selection Sort
No ratings yet
Unit 2 - Selection Sort
10 pages
Lindu Software Presentation SEACG 2018 BALI
No ratings yet
Lindu Software Presentation SEACG 2018 BALI
24 pages
Advanced Differential Equations and Mathematical Modeling
No ratings yet
Advanced Differential Equations and Mathematical Modeling
5 pages
Lab 1 BVer 1
No ratings yet
Lab 1 BVer 1
4 pages
Anomaly Detection and Failure Root Cause Analysis
No ratings yet
Anomaly Detection and Failure Root Cause Analysis
36 pages
Code Helper
No ratings yet
Code Helper
14 pages
Data Sructure and Algrithm
No ratings yet
Data Sructure and Algrithm
12 pages
Newton-Gregory Interpolation & Curve Fitting
No ratings yet
Newton-Gregory Interpolation & Curve Fitting
6 pages
Ec1008 Signals and Systems PDF
No ratings yet
Ec1008 Signals and Systems PDF
9 pages
Forecasting Stability Categories Using Neural Networks
No ratings yet
Forecasting Stability Categories Using Neural Networks
5 pages
USN Name Seminar Topics Scheduled Date & Time Rescheduled Date & Time
No ratings yet
USN Name Seminar Topics Scheduled Date & Time Rescheduled Date & Time
2 pages