BACK PROPAGATION
NETWORK
Back propagation network (BPN)
Network associated with back propagation learning
algorithm (BPLA).
BPLA is one of the most important development in neural
network.
BPLA is applied to multilayer feed forward networks
consisting of processing elements with continuous
differentiable activation function.
BPN is used to classify the input patterns correctly.
Basics of gradient descent method is used in weight update
algorithm.
Back propagation network (BPN)
Error will be propagated back to the hidden unit.
Aims to achieve a balance between the network’s ability to
respond and its ability to give reasonable responses to the
input that is similar but not identical to the training input.
Training stages in BPN network:
Output generation of the network for the input pattern
Calculation and back propagation of the error
Updation of weights.
Architecture
BPN is a multi layer feed-forward neural network consisting
of
Input layer
Hidden layer and
Output layer
During back propagation of error, the signal are sent in the
reverse direction.
Input and output of BPN may be binary or bipolar.
Activation function could be any function which increases
monotonically and differentiable.
Architecture (Contd..,)
Notations
x – input training vector
t – target output vector
α - learning rate
voj – bias on jth hidden unit
wok – bias on kth output unit
zj – hidden unit j
zinj – net input to zj
yk – output unit k
Notations (Contd..,)
δ k - error correction weight adjustment for wjk
δ j - error correction weight adjustment for vij
Commonly used activation function:
Binary sigmoidal function
Bipolar sigmoidal function
Properties of activation function to be used in BPN
Continuity
Differentiability
Nondecreasing monotony
Training patterns
Incremental approach for updation of weights.
Weights are being changed immediately after a single training pattern
is presented.
Batch – mode training
Weights are being changed only after all the training patterns are
presented.
Requires additional storage for each connection to maintain the
immediate weight changes.
Effectiveness of the training pattern depends on problem.
BPN – Equivalent to optimal Bayesian discriminant function
BP learning algorithm
BP learning algorithm will converge and find proper weights
for network even after enough learning if and only if there
exist a relation between input and output training pattern is
deterministic and error surface is deterministic.
BPN is a special case of stochastic approximation.
Randomness of the algorithm helps it to get out of local
optima.
Factors affecting the BPN
Training of BPN and convergence of BPN is based on the
choice of various parameters like
Initial weights
Learning rate
Updation rule
Size and nature of training set
Architecture (i.e., number of layers and number of neurons per layer)
Factors affecting the BPN
Initial weight
Initialized at random values.
Choice of initial weight determines how fast the network converges.
Can not be very high since sigmoidal activation functions used here
may get saturated and system may be stuck to local optima.
Method 1: Range in which the initial weight can be initialized
where oi is the number of processing elements j that feed – forward to
processing element i
Factors affecting the BPN
Method 2: Using Nyugen – Widrow initialization
This method leads to faster convergence of network.
Concept is based on geometric analysis.
Factors affecting the BPN
Learning rate:
Affects the convergence.
Larger value
speed up the convergence but might result in overshooting
Leads to rapid learning but there is oscillation of weights
Smaller value – Has vice versa effect.
Range: 10-3 to 10
Factors affecting the BPN
Momentum
Very efficient and commonly used method that allows a larger
learning rate without oscillations is adding a momentum factor to the
normal weight updation method.
Denoted as
Common value assigned is 0.9
Can be used in pattern by pattern updating or batch – mode updating.
Momentum factor leaves some useful information for weight
updation if pattern by pattern method is used.
Helps in faster convergence
Factors affecting the BPN
Weight updation formula
Factors affecting the BPN
Generalization
A network is said to be generalized when it sensibly interpolates with
the new input networks.
Over-fitting or Over-training:
Network learns well but does not generalize well if there are many
trainable parameters for the given amount of training data is available,
Making small changes in the input space of a pattern without
changing the output components can improve the ability of the
network to generalize to a test data set.
Smaller networks are preferred since, a network with large number
of nodes is capable of memorizing the training set that generalizing it.
Factors affecting the BPN
Number of training data T
Should be sufficient and proper.
Training data should cover the entire expected input space, and while
training, training – vector pairs should be selected randomly from the
set.
Let us consider the input space can be linearly separable into L
disjoint regions , and T is the lower bound on the number of training
patterns .
If proper value of T is selected such that T/L >> 1, then the network
can able to discriminate pattern classes using fine piecewise
hyperplane partitioning.
Factors affecting the BPN
Number of hidden layer nodes
If there is more than one hidden layer in a BPN, then calculations
performed for a single layer are repeated for other layers and are
summed up at the end.
For a network of a reasonable size, the size of hidden nodes has to be
relatively small fraction of input layer.
Example:
If the network does not converge to a solution, it may need more hidden
nodes.
Also, if the network converges, the user may try a very few hidden nodes
and then settle finally on a size based on overall system performance.
Example
Input pattern [0, 1] . Target output: 1 Learning rate,
Example (Contd..,)
Initial weights:
[ v11 v21 v01] =[0.6 -0.1 0.3]
[v12 v22 v02 ] = [-0.3 0.4 0.5]
[w1 w2 w0] = [ 0.4 0.1 -0.2]
Activation function used:
Example (Contd..,)
Calculate the net input :
For z1 layer:
For z2 layer:
Example (Contd..,)
Applying activation function:
Calculate the net input to the output layer.
Example (Contd..,)
Applying activation function, we get
Compute the error using
Now,
Example (Contd..,)
Therefore,
Change in weight between the hidden and output layer.
Example (Contd..,)
Calculate the error between the input and hidden layer using
and
Here m = 1 and j = 1 to 2.
Therefore,
Example (Contd..,)
Now,
Example (Contd..,)
Now,
Example (Contd..,)
Calculate change un weights between the input and hidden
layer
Example (Contd..,)
The final weights are calculated as