Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views70 pages

COMP3411 Week 3 - NN

The document provides an overview of neural networks, including biological and artificial neurons, single-layer and multi-layer perceptrons, and neural network design and architectures. It discusses the motivation behind artificial neural networks, their learning mechanisms, and the historical context of their development. Key concepts such as function approximation, activation functions, and backpropagation for training neural networks are also covered.

Uploaded by

tianzong Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views70 pages

COMP3411 Week 3 - NN

The document provides an overview of neural networks, including biological and artificial neurons, single-layer and multi-layer perceptrons, and neural network design and architectures. It discusses the motivation behind artificial neural networks, their learning mechanisms, and the historical context of their development. Key concepts such as function approximation, activation functions, and backpropagation for training neural networks are also covered.

Uploaded by

tianzong Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Neural Networks

COMP3411/9814: Artificial Intelligence


Lecture Overview
• Motivation

• Biological and artificial neurons

• Single-layer perceptron

• Multi-layer perceptron

• Neural network design

• Neural network architectures


Lecture Overview
• Motivation

• Biological and artificial neurons

• Single-layer perceptron

• Multi-layer perceptron

• Neural network design

• Neural network architectures


Motivation

• Great ability of cognitive beings to carry out some tasks: shape


recognition, speech and image processing, etc.

• It seemed important to understand and emulate successful


mechanisms from humans and animals: Parallelism and high
connectivity.

• A branch of artificial intelligence: Artificial Neural Networks.


Motivation

• New paradigm (non-algorithmic) to process information


(neurocomputing): learning and adaptation, distributed and
parallel processing.

• New computational tools (faster and cheaper)

• In the future: more caution and theoretical support. Open issue:


Generalization.
Motivation
• The general problem of function approximation can be divided into two
subproblems:

• Classification: to approximate a function that represents the membership


function of an entity – characterized by a set of input variables, either
continuous or discrete – to a particular class (output with discrete values),
e.g., character recognition.

• Regression: to approximate the generating function (unknown) of a process


by mapping elements from the input variables to output variables. Usually,
continuous values are used.

y = f(x, w)
Lecture Overview
• Motivation

• Biological and artificial neurons

• Single-layer perceptron

• Multi-layer perceptron

• Neural network design

• Neural network architectures


Biological Neuron
Biological Neuron
• The brain is made up of neurons (nerve cells) which have
• a cell body (soma)
• dendrites (inputs)
• an axon (outputs)
• synapses (connections between cells)

• Synapses can be excitatory or inhibitory and may change over time.

• When the inputs reach some threshold an action potential (electrical


pulse) is sent along the axon to the outputs.
Biological Neuron

• Human brain has 100 billion neurons (~10 -


10 10 neurons)
11 with an
average of 10,000 synapses each (some even with 100,000
synapses).

• Latency is about 3-6 milliseconds.

• At most a few hundred “steps” in any mental computation, but


massively parallel.
Artificial Neuron

• Automata characterized by:

• An internal state.

• Input signals.

• Activation and transfer functions.


Artificial Neuron

• McCulloch-Pitts’
model (1943)
Artificial Neuron
• McCulloch-Pitts model:
• Inputs either 0 or 1.
• Output 0 or 1.
• Input can be either excitatory or inhibitory.

• Summing inputs
• If input is 1, and is excitatory, add 1 to sum. 𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑥𝑥1 � 𝑤𝑤1 + 𝑥𝑥2 � 𝑤𝑤2 + 𝑥𝑥3 � 𝑤𝑤3
• If input is 1, and is inhibitory, subtract 1 from sum. +…

• Threshold, if sum < Ө then output is 0


• if sum < threshold Ө, output 0. else output is 1.
• Otherwise, output 1.
Learning
• Ability of a neuron (or neural net) to adjust connections (weights) to
obtain the intended output or that meets certain criteria.

• Hebbian learning (1949): When a neuron A persistently activates


another nearby neuron B, the connection between the two neurons
becomes stronger. Specifically, a growth process occurs that
increases how effective neuron A is in activating neuron B. As a
result, the connection between those two neurons is strengthened
over time.

• “Neurons that fire together, wire together”, Hebb.


Artificial Neural Networks
• Information processing architecture loosely modelling the brain

• Consists of many interconnected processing units (neurons)


• Work in parallel to accomplish a global task

• Generally used to model relationships between inputs and outputs or


to find patterns in data

• Characterized by (i) number of neurons, (ii) interconnection


architecture, (iii) weight values, (iv) activation and transfer functions.
Artificial Neural Networks

❑ ANNs nodes have


➢ inputs edges with some weights
➢ outputs edges with weights
➢ activation level (function of inputs)

• Weights can be positive or negative and may change over time (learning).
• The input function is the weighted sum of the activation levels of inputs.
• The activation level is a non-linear transfer function g of this input:

activation𝑗𝑗 = 𝑔𝑔 𝑠𝑠𝑗𝑗 = 𝑔𝑔 � 𝑤𝑤𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖


𝑖𝑖
Some nodes are inputs (sensing), some are outputs (action)
Artificial Neural Networks
Artificial Neural Networks

• Neural networks (NN) might work in two ways:

• Learning: adapting its weights, architecture, activation and


transfer functions.

• Simulation or recognition: it is used for information processing.

• NN learning: Supervised (through examples), unsupervised.


Activation Functions
Function 𝑔𝑔(𝑠𝑠) takes the weighted sum of inputs and produces output
for node, given some threshold.

1 if 𝑠𝑠 ≥ 0
𝑔𝑔 𝑠𝑠 = �
0 if 𝑠𝑠 < 0
Lecture Overview
• Motivation

• Biological and artificial neurons

• Single-layer perceptron

• Multi-layer perceptron

• Neural network design

• Neural network architectures


Single-layer Perceptron

• Frank Rossenblat, 1957

• Use a logic threshold function for classification tasks.

• Classification and discriminant functions

• How can we make the classification?

• Using discriminant functions defined in the vector space we want


to classify and produce a value that can be compared.
Single-layer Perceptron

• If we have a function (gi) for each class (si). The classification rule is:

u Є si iff gi (u) > gj (u); j ≠ i

• For 2-classes problems, it can be reduced to one function as:

g(u) = g1(u) – g2(u), then u Є s1 g(u) > 0 and u Є s2 otherwise


Single-layer Perceptron

• Linear classification with discriminant function using perceptron:

g (u) = w1 u1 + w2 u2 + ...+ wn un = w . u

• The value associated with g(u) = 0 corresponds to the border, a


hyperplane, that divides the two classes.

• Perceptrons are appropriate only for classes linearly separable.

• The learning problem is reduced to finding a hyperplane that


separates the two classes.
Single-layer Perceptron

𝑦𝑦 = 𝑎𝑎𝑎𝑎 + 𝑏𝑏
X0
w0

a y

w1 θ

X1
ω0 θ
X 0ω0 + X 1ω1 =
θ ⇒ X1 =
− X0 +
ω1 ω1
X1

A A

A A
B B
A
X0

B B
Single-layer Perceptron

• Simplest output function


Single-layer Perceptron

• AND gate output


Input 1

(0,1) (1,1)

1.5 = w1*I1 + w2*I2

Input 2
(0,0) (1,0)
Single-layer Perceptron

• XOR gate
Input 1
• => hidden layer
needed (0,1) (1,1)

Input 2
(0,0) (1,0)
Single-layer Perceptron
• Linearly separable if there is a hyperplane where classification is
true on one side of the hyperplane and false on the other side
• For the sigmoid function, when the hyperplane is:
𝑥𝑥1 � 𝑤𝑤1 + ⋯ + 𝑥𝑥𝑛𝑛 � 𝑤𝑤𝑛𝑛 = 0
Single-layer Perceptron

• Learning rule
Single-layer Perceptron
• Perceptron convergence theorem:

• For any data set that is linearly separable, the perceptron


learning rule is guaranteed to find a solution in a finite
number of iterations.
Historical Context
• In 1969, Minsky and Papert published a book highlighting limitations of
perceptrons.
• Funding agencies redirected funding away from neural network research
preferring instead logic-based methods such as expert systems.

• Known since 1960s that any logical function could be implemented in a 2-layer
neural network with step function activations.

• The problem was how to learn the weights of a multi-layer neural network from
training examples.

• Solution found in 1974 by Paul Werbos.


• Not widely known until rediscovered in 1986 by Rumelhart, Hinton and
Williams.
Lecture Overview
• Motivation

• Biological and artificial neurons

• Single-layer perceptron

• Multi-layer perceptron

• Neural network design

• Neural network architectures


Multi-layer Neural Network

• Given an explicit logical function, we can design a multi-layer neural network by


hand to compute that function.
• But, if we are just given a set of training data, can we train a multi-layer network
to fit these data?
Multi-layer Perceptron

• Definition: A network in which


its neurons are organised in
successive layers. Each layer
receives inputs from the
previous one (or external
input) and sends its outputs to
the next layer. There are no
internal connections in each
layer.
Feedforward Propagation

𝑤𝑤𝑖𝑖,𝑗𝑗 ≡ weight between node 𝑖𝑖and node 𝑗𝑗

• Feed-forward network = a parameterised family of nonlinear functions:


𝑎𝑎5 = 𝑔𝑔 𝑊𝑊3,5 � 𝑎𝑎3 + 𝑊𝑊4,5 � 𝑎𝑎4
= 𝑔𝑔 𝑊𝑊3,5 � 𝑔𝑔 𝑊𝑊1,3 � 𝑎𝑎1 + 𝑊𝑊2,3 � 𝑎𝑎2 + 𝑊𝑊4,5 � 𝑔𝑔 𝑊𝑊1,4 � 𝑎𝑎1 + 𝑊𝑊2,4 � 𝑎𝑎2
• Adjusting weights changes the function
Feedforward Propagation

(a) is a step function or threshold function


1
(b) is a sigmoid function 1+𝑒𝑒 − 𝑥𝑥

Changing the bias weight 𝑏𝑏 moves the threshold


vi 1 wj 1
x1(t)
x2(t)
Σ fNL Σ y1(t)



xn(t)

bin 1 bout 1
vi 2 wj 2

Σ fNL Σ y2(t)



bin 2 bout 2

...
...
vi p wj m

Σ fNL Σ ym(t)


bin p bout m
Backpropagation

1. Forward pass: apply inputs to the Forward Pass


“lowest layer” and feed activations
forward to get output.

2. Calculate error: difference between


desired output and actual output.

3. Backward pass: Propagate errors


back through the network to adjust
weights. Backprop
Random
Updating initial
weights
Backpropagation
weights
of the network
Error
Output calculation
backpropagation
Error calculation
w11
X1
Output of
w’11 the network
w12

w21
w’21

w22 error= y − yˆ
X2

w31 w32
w’31

1 1
Backpropagation
Gradient descent
1 2
𝐸𝐸 = ∑(𝑑𝑑 − 𝑦𝑦)
2𝑁𝑁

If transfer functions are smooth, can use multivariate calculus to adjust weights by taking
the steepest downhill direction.
𝜕𝜕𝜕𝜕
𝑤𝑤 ← 𝑤𝑤 − α
𝜕𝜕𝜕𝜕

Parameter α is the learning rate

• How the cost function affects the particular weight


Backpropagation
The derivative of a function is the
slope of the tangent at a point

𝑦𝑦 = 𝑓𝑓 𝑥𝑥 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏

𝑐𝑐ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖 𝑦𝑦 ∆𝑦𝑦


𝑚𝑚 = =
𝑐𝑐ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖 𝑥𝑥 ∆𝑥𝑥

𝑑𝑑𝑑𝑑
Written
𝑑𝑑𝑑𝑑
Backpropagation
Partial derivative

Derivative of a function of several variables with respect to one of these


variables

If 𝑧𝑧 = 𝑓𝑓 𝑥𝑥, 𝑦𝑦, …

𝜕𝜕𝜕𝜕
Derivative with respect to x is written:
𝜕𝜕𝜕𝜕
Backpropagation
1 1 1

0 0 0

-1 -1 -1

Function must be continuous to be differentiable


Replace the (discontinuous) step function with a differentiable function, such as the sigmoid:
1
𝑔𝑔 𝑠𝑠 =
1 + 𝑒𝑒 −𝑠𝑠
or hyperbolic tangent

𝑒𝑒 𝑠𝑠 −𝑒𝑒 −𝑠𝑠 1
𝑔𝑔 𝑠𝑠 = tanh 𝑠𝑠 = =2 −1 (-1 to 1)
𝑒𝑒 𝑠𝑠 +𝑒𝑒 −𝑠𝑠 1+𝑒𝑒 −2𝑠𝑠
Lecture Overview
• Motivation

• Biological and artificial neurons

• Single-layer perceptron

• Multi-layer perceptron

• Neural network design

• Neural network architectures


Step 1: Exhaustive Analysis of the System

• It should determine the number and type of input variables and


model output, reducing the number of variables.

• Is it really necessary to use a neural model? Why not use any other
existing classic model (e.g., phenomenological)?

• Neural Network: Best second solution.

• If a neural model is used, do we have available data representing


properly the system to be modelled? Do we have enough?
Step 2: Preprocessing
• Data: a neural network is a black-box model (a.k.a. empirical model) for
interpolation (never extrapolation); therefore, they greatly depend on
quality and quantity of data available.

• Quality: related to the degree to which the available data represents the
function being approximated. Ideal: to obtain them by following a
properly designed survey/experimental plan.

• Quantity: It is extremely important because only an adequate amount of


data will allow us to correctly identify the parameters (weights) of our
neural model.

• If the quantity of data is small, we cannot expect to develop a


complex neural model.
Step 2: Preprocessing
• Visual examination of the data.

• Detect and, if possible, eliminate outliers, empty values, etc.

• It might also help to detect correlations between variables.

• Normalization of variables: It is necessary when variables with different units and,


therefore, potentially different magnitudes are involved. Sometimes, the magnitudes
can differ by several orders of magnitude.

• Xn = (X-Xmin)/(Xmax-Xmin); Xn ∈ [0,1]

• Xn = 2*(X-Xmin)/(Xmax-Xmin) – 1; Xn ∈ [-1,1]

• It is necessary to perform the corresponding denormalization at the output stage.


Step 3: Design of the Neural Model
• Input and output neurons depend on the previous analysis of the
system.
• But, what about the number of neurons Nh in the hidden layer?

• Rule of thumb: Nh should lead to a number of parameters (weights) Nw


that:

• Nw < (Number of samples) / 10

• The number of weights Nw of an MLP, with Ni neurons in its input layer, a


hidden layer with Nh neurons, and No neurons in the output layer is:

• Nw = (Ni+1)*Nh+(Nh+1)*No
Step 3: Design of the Neural Model

• An MLP with 3 inputs, 4 units in its hidden layer, and 2 outputs, has
a number of parameters:

• Nw = (3+1)*4+(4+1)*2 = 26

• Then, at least 260 samples are required to train the network weights.
Step 3: Design of the Neural Model

• In MLPs is demonstrated that using one hidden layer with a proper


number of neurons is sufficient to approximate any non-linear
function with an arbitrary precision degree.

• Activation functions: A usual criterion is to use sigmoid functions or


ReLUs in the hidden layer and linear functions in the output.
However, sigmoids or softmax can also be used in the output.
Step 4: Training
• Training a neural network is a hard process due to the complexity of the
error function solution space, which can have numerous local minima,
saddle (minimax) points, etc.

• There are three main problems that can arise during training:

• Bias

• Overparameterization

• Overfitting

• The latter two might affect the network's ability to generalize (high
variance).
Step 4: Training

Training bias y(x)

x
Step 4: Training

• To decrease bias:

• Increase (prudently) the number of neurons in the hidden layer.

• Aim to reach a better local minimum by conducting a sufficient


number of different training processes, starting from randomly
chosen initial weights (20 or more attempts).
Step 4: Training

High variance problem


(overparameterization and y(x)
overfitting)

x
Step 4: Training
• To avoid overfitting problem, work with two sets during training:

• Training set

• Test set

• The best is to visualize the error function simultaneously on both


sets.

• Characteristics of the training and test sets:

• Both sets should be large enough, and data should be


representative on both sets.
Step 4: Training

Error function for Error


training set (-) and test
set (---)
Minimum test error

Number of
epochs
Step 4: Training

Error

Number of network
parameters
Step 4: Training

• Cross-validation: Different neural network models are developed


using the available data, splitting the training and test sets in
different ways. The model that achieves the minimum error on the
test set is chosen.

• Additional training aspects:

• Weight initialisation.
• Online or batch learning.
• Adjust the parameters, e.g., learning rate and epochs to suit the
particular task.
Step 5: Generalisation

• To test the generalisation capability of the network, that is, its


performance on a different (never seen) set of data, a small (but
representative) third set might be reserved, the generalisation set.

• This set should also be representative of the phenomenon being


modelled as the previous sets (training and test).
Step 5: Generalisation

y(x)
Approximation of the
underlying function

x
Lecture Overview
• Motivation

• Biological and artificial neurons

• Single-layer perceptron

• Multi-layer perceptron

• Neural network design

• Neural network architectures


Neural Network Architectures
• Two main network structures
Neural Network Architectures

Feed-forward network has connections only in one direction:

• Every node receives input from “upstream” nodes; delivers output


to “downstream” nodes.

• No loops.

• Represents a function of its current input.

• It has no internal state other than the weights themselves.


Neural Network Architectures
• Two main network structures
Neural Network Architectures
Recurrent network feeds outputs back into its own inputs:

• Activation levels of network form a dynamical system.

• It may reach a stable state or exhibit oscillations or even chaotic


behaviour.

• Response of network to an input depends on its initial state.

• This may depend on previous inputs.

• Can support short-term memory.


Deep Learning Architectures
• Multiple layers form a hierarchical model, known as deep learning.
• Convolutional neural networks are specialised for vision tasks.
• Recurrent neural networks are used for time series.

• Typical real-world network can have 10 to 20 layers with hundreds of


millions of weights:
• It can take hours, days, or months to learn on machines with
thousands of cores.
References

• Poole & Mackworth, Artificial


Intelligence: Foundations of
Computational Agents, Chapter 7.
• Russell & Norvig, Artificial Intelligence: a
Modern Approach, Chapters 18.6, 18.7
.
• Bishop, Neural Networks and Their
Applications, Review of Scientific
Instruments, 65(6): 1803-1832.
Feedback
• In case you want to provide anonymous
feedback on these lectures, please visit:

• https://forms.gle/KBkN744QuffuAZLF8

Muchas gracias!

You might also like