Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views122 pages

Ann 2

Uploaded by

pmandlik27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views122 pages

Ann 2

Uploaded by

pmandlik27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Amity School of Engineering

& Technology

NEURAL NETWORK
LEARNING RULES
CHAPTER 2

1
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Neural Network Learning Rules


Amity School of Engineering
& Technology

Neural Network Learning Rules


➢ Perceptron Learning Rule -- Supervised Learning

➢ Hebbian Learning Rule – Unsupervised Learning

➢ Delta Learning Rule -- Supervised Learning

➢ Widrow-Hoffs Learning Rule -- Supervised Learning

➢ Correlation Learning Rule -- Supervised Learning

➢ Winner-Take-all Learning Rule -- Unsupervised Learning

➢ Outstar Learning Rule -- Supervised Learning


Amity School of Engineering
& Technology

MP Neuron
Amity School of Engineering
& Technology

Perceptron Learning Rule -- Supervised Learning


Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Bipolar Binary threshold =0


Amity School of Engineering
& Technology

Bipolar Binary threshold =0


Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Step 2:
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Q: Four steps of Hebbian Learning of a single neuron


network have been implemented starting with

W1 = [1 -1]

For learning constant c= 1 using inputs as follows:


X1 = [1 -2]
X2 = [0 1]
X3 = [2 3]
X4 = [1 -1]

Find final weights for:


a) Bipolar binary f(net)
b) Bipolar continuous f(net)
Amity School of Engineering
& Technology

DELTA LEARNING RULE


• It is supervised learning.
• This rule states that the modification in sympatric weight of a node is equal to
the multiplication of error and the input.
• In Mathematical form the delta rule is as follows:

• For a given input vector, compare the output vector is the correct answer. If
the difference is zero, no learning takes place; otherwise, adjusts its
weights to reduce this difference.
• The change in weight from ui to uj is: dwij = r* ai * ej.
where r is the learning rate, ai represents the activation of ui and ej is
the difference between the expected output and the actual output of uj.
Amity School of Engineering
& Technology

DELTA LEARNING RULE

Using Gradient Descent, the Delta Rule strives to find the best-fitting model. In
other words:

What are the weights that would make my neural network fit the training
data the best, with the highest performance –> Least amount of error!

The Delta Rule, uses gradient descent as an optimization techniques, and tries
different values for the weights in a neural network, and depending on how
accurate the output of the network is (i.e., how close to the ground truth), it will
make certain adjustments to certain weights (i.e., increase some and decrease
the other). It will try to increase and decrease the weights in a way that the error
of the output would go down, during training.
Amity School of Engineering
& Technology

DELTA LEARNING RULE


Amity School of Engineering
& Technology

DELTA LEARNING RULE

For any given set of input data and weights, there will be an associated magnitude of
error, which is measured by an error function (also known as a cost function). The Delta
Rule employs the error function for what is known as Gradient Descent learning, which
involves the ‘modification of weights along the most direct path in weight-space to
minimize error’, so change applied to a given weight is proportional to the negative of
the derivative of the error with respect to that weight
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

f `(net) = ½(di2 – oi2)


f `(net) = ½(1 – oi2)
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

−0.537 = - 0.2113
=
2.537
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Adaptive Linear Neuron -- Adaline

Adaline which stands for Adaptive Linear Neuron, is a network having a single
linear unit.
Some important points about Adaline are as follows −
•It uses bipolar activation function.
•It uses delta rule for training to minimize the Mean-Squared Error(MSE) between
the actual output and the desired/target output.
•The weights and the bias are adjustable.
Amity School of Engineering
& Technology

Adaptive Linear Neuron -- Adaline

Architecture
The basic structure of Adaline is similar to perceptron having an extra feedback loop with the help of which the
actual output is compared with the desired/target output. After comparison on the basis of training algorithm, the
weights and bias will be updated.
Amity School of Engineering
& Technology

Training Algorithm-- Adaline


Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

The Perceptron and ADALINE fundamental difference


• The difference is the learning procedure to update the weight of the
network.
• The perceptron updates the weights by computing the difference between
the expected and predicted class values. In other words, the perceptron
always compares +1 or -1 (predicted values) to +1 or -1 (expected values).
An important consequence of this is that perceptron only learns when errors
are made.
• In contrast, the ADALINE computes the difference between the expected
class value y(+1 or -1), and the continuous output value 𝑌෠ from the linear
function, which can be any real number.
• This is crucial because it means the ADALINE can learn even when no
classification mistake has been made.
• Since the ADALINE learns all the time and the perceptron only after errors,
the ADALINE will find a solution faster than the perceptron for the same
problem.
Amity School of Engineering
& Technology

Q: Implement OR Gate by using ADALINE


network for bipolar inputs
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

EPOCH 1 : MSE=1.4275
EPOCH 2 : MSE=0.6075

b = 0.43
X1
w1 = 0.55
X1 y

X2 w2 = -0.38
Amity School of Engineering
& Technology

Multiple Adaptive Linear Neuron: Madaline


Madaline which stands for Multiple Adaptive Linear Neuron, is a network
which consists of many Adalines in parallel. It will have a single output unit.
Some important points about Madaline are as follows −

•It is just like a multilayer perceptron, where Adaline will act as a hidden unit
between the input and the Madaline layer.

•The weights and the bias between the input and Adaline layers, as in we
see in the Adaline architecture, are adjustable.

•The weight between Adaline and Madaline layers is fixed weight and bias
of 1.
Amity School of Engineering
& Technology

Architecture
The architecture of Madaline consists of “n” neurons of the input
layer, “m” neurons of the Adaline layer, and 1 neuron of the Madaline layer. The
Adaline layer can be considered as the hidden layer as it is between the input layer
and the output layer, i.e. the Madaline layer.
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Q: Using the Madaline network, implement XOR function with bipolar inputs
and targets. Assume the required parameters for the training of the
network. Assume the initial weights as w11 = 0.05, w21 = 0.2, w12 = 0.1, w22
= 0.2, b1 = 0.3, b2=0.15, b3 = 0.5, v1 = v2 = 0.5. “w” is the weight between i/p
layer and hidden layer. “v” is the weight between hidden layer and o/p layer.

α = 0.5
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
b1
b3
b1 = -1.07

b3 = 0.5
x1 w11 = 1.32
X1 Z1
v1 = 0.5
w12 = -1.29

Y y

w21 = -1.34
x2 V2 = 0.5
X2 Z2
W 22 = 1.29

b2 = -1.08

b2

Madaline Network for XOR function


Amity School of Engineering
& Technology

Q: Use Adaline to design OR Gate

W1=w2=b=0.1, learning rate = 0.1


Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Widrow-Hoffs Learning Rule -- Supervised Learning


Amity School of Engineering
& Technology

Correlation Learning Rule -- Supervised Learning


Amity School of Engineering
& Technology

Winner-Take-all Learning Rule -- Unsupervised Learning


Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Outstar Learning Rule -- Supervised Learning


Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

ERROR BACK
PROPAGATION
TRAINING
Amity School of Engineering
& Technology

Back-Propagation
• Back-propagation is the essence of neural net training.
• It is the method of fine-tuning the weights of a neural net based on the error
rate obtained in the previous epoch (i.e., iteration).
• Proper tuning of the weights allows you to reduce error rates and to make
the model reliable by increasing its generalization.
• Backpropagation is a short form for "backward propagation of errors."
• It is a standard method of training artificial neural networks. This method
helps to calculate the gradient of a loss function with respects to all the
weights in the network.
Amity School of Engineering
& Technology

Back-Propagation
Amity School of Engineering
& Technology

Back-Propagation

4
Amity School of Engineering
& Technology

Back-Propagation
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Back-Propagation Overview
Amity School of Engineering
& Technology

Backpropagation Algorithm

Each training iteration of NN has two main stages


1.Forward pass/propagation
2.BP

The BP stage has the following steps


•Evaluate error signal for each layer
•Use the error signal to compute error gradients
•Update layer parameters using the error gradients with an optimization algorithm
such as GD.
Amity School of Engineering
& Technology

Advantages of Backpropagation are:


•Backpropagation is fast, simple and easy to program
•It is a flexible method as it does not require prior knowledge about the
network
•It is a standard method that generally works well
•It does not need any special mention of the features of the function to be
learned.
Amity School of Engineering
& Technology

Disadvantages of using Backpropagation


•The actual performance of backpropagation on a specific
problem is dependent on the input data.
•Backpropagation can be quite sensitive to noisy data
•We need to use the matrix-based approach for backpropagation
instead of mini-batch.
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

B1

0.3
B3
-0.2
-0 0.6
X1 Z1 0.4
-0.3
Y1
-0.1 y
1 Z1 0.1
X2
0.4

0.5
B2
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology
Amity School of Engineering
& Technology

Summary
•A neural network is a group of connected it I/O units where each connection
has a weight associated with its computer programs.
•Backpropagation is a short form for “backward propagation of errors.” It is a
standard method of training artificial neural networks
•Back propagation algorithm in machine learning is fast, simple and easy to
program
•A feedforward BPN network is an artificial neural network.
•Two Types of Backpropagation Networks are 1)Static Back-propagation 2)
Recurrent Backpropagation
•Back propagation in data mining simplifies the network structure by removing
weighted links that have a minimal effect on the trained network.
•It is especially useful for deep neural networks working on error-prone projects,
such as image or speech recognition.
•The biggest drawback of the Backpropagation is that it can be sensitive for
noisy data.

You might also like