0% found this document useful (0 votes)

7 views14 pages

DL Unit 2a

Uploaded by

Rohit 6 World

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views14 pages

DL Unit 2a

Uploaded by

Rohit 6 World

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

UNIT-2

Feed forward Networks: Multilayer Perceptron, Gradient Descent, Back

propagation, Empirical Risk Minimization.
Auto encoders: Regulraized Auto encoders, Representational Power, Layer, Size,
and Depth of Auto encoders, Stochastic Encoders and Decoders, Contractive
Encoders.
Regularization: Bias Variance Tradeoff, L2regularization, Early stopping, Data set
augmentation, Parameter sharing and tying, Injecting noise at input, Ensemble
methods, Dropout, Greedy
_____________________________________________________________________

Multilayer Perceptron:-

Multi layer perceptron (MLP) is a supplement of feed forward neural network. It

consists of three types of layers—the input layer, output layer and hidden layer, as
shown in Fig. 3. The input layer receives the input signal to be processed.

It has 3 layers including one hidden layer. If it has more than 1 hidden layer, it is
called a deep ANN. An MLP is a typical example of a feedforward artificial neural
network. In this figure, the ith activation unit in the lth layer is denoted as ai(l).

The number of layers and the number of neurons are referred to as hyperparameters of
a neural network, and these need tuning. Cross-validation techniques must be used to
find ideal values for these.
The weight adjustment training is done via backpropagation. Deeper neural networks
are better at processing data. However, deeper layers can lead to vanishing gradient
problems. Special algorithms are required to solve this issue.

The algorithm for the MLP is as follows:

1. Just as with the perceptron, the inputs are pushed forward through the MLP by
taking the dot product of the input with the weights that exist between the input
layer and the hidden layer (WH). This dot product yields a value at the hidden
layer. We do not push this value forward as we would with a perceptron
though.
2. MLPs utilize activation functions at each of their calculated layers. There are
many activation functions to discuss: rectified linear units (ReLU), sigmoid
function, tanh. Push the calculated output at the current layer through any of
these activation functions.
3. Once the calculated output at the hidden layer has been pushed through the
activation function, push it to the next layer in the MLP by taking the dot
product with the corresponding weights.
4. Repeat steps two and three until the output layer is reached.
5. At the output layer, the calculations will either be used for
a backpropagation algorithm that corresponds to the activation function that
was selected for the MLP (in the case of training) or a decision will be made
based on the output (in the case of testing).

Notations

In the representation below:

 ai(in) refers to the ith value in the input layer

 ai(h) refers to the ith unit in the hidden layer
 ai(out) refers to the ith unit in the output layer
 ao(in) is simply the bias unit and is equal to 1; it will have the corresponding
weight w0
 The weight coefficient from layer l to layer l+1 is represented by wk,j(l)
A simplified view of the multilayer is presented here. This image shows a fully
connected three-layer neural network with 3 input neurons and 3 output neurons. A
bias term is added to the input vector.

Forward Propagation

In the following topics, let us look at the forward propagation in detail.

MLP Learning Procedure

The MLP learning procedure is as follows:

 Starting with the input layer, propagate data forward to the output layer. This step
is the forward propagation.
 Based on the output, calculate the error (the difference between the predicted and
known outcome). The error needs to be minimized.
 Backpropagate the error. Find its derivative with respect to each weight in the
network, and update the model.
Repeat the three steps given above over multiple epochs to learn ideal weights.

Finally, the output is taken via a threshold function to obtain the predicted class labels.

Forward Propagation in MLP

In the first step, calculate the activation unit al(h) of the hidden layer.
Activation unit is the result of applying an activation function φ to the z value. It must
be differentiable to be able to learn weights using gradient descent. The activation
function φ is often the sigmoid (logistic) function.

It allows nonlinearity needed to solve complex problems like image processing.

Sigmoid Curve

The sigmoid curve is an S-shaped curve.

Activation of Hidden Layer

The activation of the hidden layer is represented as:

z(h) = a(in) W(h)

a(h) =

For the output layer:

Z(out) = A(h) W(out)

A(out) =

Gradient Descent
Gradient descent is an optimization algorithm which is commonly-used to train
machine learning models and neural networks. Training data helps these models learn
over time, and the cost function within gradient descent specifically acts as a
barometer, gauging its accuracy with each iteration of parameter updates. It helps in
finding the local minimum of a function.

The best way to define the local minimum or local maximum of a function using
gradient descent is as follows:

o If we move towards a negative gradient or away from the gradient of the

function at the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the
function at the current point, we will get the local maximum of that function.
This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a gradient descent algorithm is to minimize the
cost function using iteration. To achieve this goal, it performs two steps iteratively:

o Calculates the first-order derivative of the function to compute the gradient or

slope of that function.
o Move away from the direction of the gradient, which means slope increased
from the current point by alpha times, where Alpha is defined as Learning Rate.
It is a tuning parameter in the optimization process which helps to decide the
length of the steps.

Cost-function

The cost function is defined as the measurement of difference or error between

actual values and expected values at the current position and present in the form of
a single real number. It helps to increase and improve machine learning efficiency by
providing feedback to this model so that it can minimize error and find the local or
global minimum. Further, it continuously iterates along the direction of the negative
gradient until the cost function approaches zero. At this steepest descent point, the
model will stop learning further. Although cost function and loss function are
considered synonymous, also there is a minor difference between them. The slight
difference between the loss function and the cost function is about the error within the
training of machine learning models, as loss function refers to the error of one training
example, while a cost function calculates the average error across an entire training
set.
The cost function is calculated after making a hypothesis with initial parameters and
modifying these parameters using gradient descent algorithms over known data to
reduce the cost function.

Hypothesis:

Parameters:

Cost function:

Goal:

Before starting the working principle of gradient descent, we should know some basic
concepts to find out the slope of a line from linear regression. The equation for simple
linear regression is given as:

1. Y=mX+c

Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-
axis.

The starting point(shown in above fig.) is used to evaluate the performance as it is

considered just as an arbitrary point. At this starting point, we will derive the first
derivative or slope and then use a tangent line to calculate the steepness of this slope.
Further, this slope will inform the updates to the parameters (weights and bias).

The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point, it
approaches the lowest point, which is called a point of convergence.
The main objective of gradient descent is to minimize the cost function or the error
between expected and actual. To minimize the cost function, two data points are
required:

o Direction & Learning Rate

These two factors are used to determine the partial derivative calculation of future
iteration and allow it to the point of convergence or local minimum or global
minimum. Let's discuss learning rate factors in brief;

Learning Rate:

It is defined as the step size taken to reach the minimum or lowest point. This is
typically a small value that is evaluated and updated based on the behavior of the cost
function. If the learning rate is high, it results in larger steps but also leads to risks of
overshooting the minimum. At the same time, a low learning rate shows the small step
sizes, which compromises overall efficiency but gives the advantage of more
precision.

Types of Gradient Descent

Based on the error in various training models, the Gradient Descent learning
algorithm can be divided into Batch gradient descent, stochastic gradient descent,
and mini-batch gradient descent. Let's understand these different types of gradient
descent:
1. Batch Gradient Descent:

Batch gradient descent (BGD) is used to find the error for each point in the training
set and update the model after evaluating all training examples. This procedure is
known as the training epoch. In simple words, it is a greedy approach where we have
to sum over all examples for each update.

Advantages of Batch gradient descent:

o It produces less noise in comparison to other gradient descent.

o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all training samples.

2. Stochastic gradient descent

Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration. Or in other words, it processes a training epoch for each
example within a dataset and updates each training example's parameters one at a
time. As it requires only one training example at a time, hence it is easier to store in
allocated memory. However, it shows some computational efficiency losses in
comparison to batch gradient systems as it shows frequent updates that require more
detail and speed. Further, due to frequent updates, it is also treated as a noisy gradient.
However, sometimes it can be helpful in finding the global minimum and also
escaping the local minimum.

Advantages of Stochastic gradient descent:

In Stochastic gradient descent (SGD), learning happens on every example, and it

consists of a few advantages over other gradient descent.

o It is easier to allocate in desired memory.

o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.

3. MiniBatch Gradient Descent:

Mini Batch gradient descent is the combination of both batch gradient descent and
stochastic gradient descent. It divides the training datasets into small batch sizes then
performs the updates on those batches separately. Splitting training datasets into
smaller batches make a balance to maintain the computational efficiency of batch
gradient descent and speed of stochastic gradient descent. Hence, we can achieve a
special type of gradient descent with higher computational efficiency and less noisy
gradient descent.

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.

o It is computationally efficient.
o It produces stable gradient descent convergence.

Back Propagation Algorithm:-

Backpropagation, or backward propagation of errors, is an algorithm that is designed

to test for errors working back from output nodes to input nodes. It is an important
mathematical tool for improving the accuracy of predictions in data mining and
machine learning.

Backpropagation is a widely used algorithm for training feedforward neural

networks. It computes the gradient of the loss function with respect to the network
weights. It is very efficient, rather than naively directly computing the gradient
concerning each weight. This efficiency makes it possible to use gradient methods to
train multi-layer networks and update weights to minimize loss; variants such as
gradient descent or stochastic gradient descent are often used.
The backpropagation algorithm works by computing the gradient of the loss function
with respect to each weight via the chain rule, computing the gradient layer by layer,
and iterating backward from the last layer to avoid redundant computation of
intermediate terms in the chain rule.

Features of Backpropagation:

1. it is the gradient descent method as used in the case of simple perceptron network
with the differentiable unit.
2. it is different from other networks in respect to the process by which the weights
are calculated during the learning period of the network.
3. training is done in the three stages :
 the feed-forward of input training pattern
 the calculation and backpropagation of the error
 updation of the weight
Working of Backpropagation:
Neural networks use supervised learning to generate output vectors from input
vectors that the network operates on. It Compares generated output to the desired
output and generates an error report if the result does not match the generated output
vector. Then it adjusts the weights according to the bug report to get your desired
output.

Backpropagation Algorithm:

Step 1: Inputs X, arrive through the preconnected path.

Step 2: The input is modeled using true weights W. Weights are usually chosen
randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer
to the output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to
reduce the error.
Step 6: Repeat the process until the desired output is achieved.

Parameters :

 x = inputs training vector x=(x 1,x2,…………xn).

 t = target vector t=(t 1,t2……………tn).
 δk = error at output unit.
 δj = error at hidden layer.
 α = learning rate.
 V0j = bias of hidden unit j.
Training Algorithm :
Step 1: Initialize weight to small random values.
Step 2: While the steps stopping condition is to be false do step 3 to 10.
Step 3: For each training pair do step 4 to 9 (Feed-Forward).
Step 4: Each input unit receives the signal unit and transmits the signal x i signal to
all the units.
Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to calculate its
net input
zinj = v0j + Σxivij ( i=1 to n)
Applying activation function z j = f(zinj) and sends this signals to all units in
the layer about i.e output units
For each output l=unit y k = (k=1 to m) sums its weighted input signals.
yink = w0k + Σ ziwjk (j=1 to a)
and applies its activation function to calculate the output signals.
yk = f(yink)
Backpropagation Error :
Step 6: Each output unit y k (k=1 to n) receives a target pattern corresponding to an
input pattern then error is calculated as:
δk = ( tk – yk ) + yink
Step 7: Each hidden unit Zj (j=1 to a) sums its input from all units in the layer
above
δinj = Σ δj wjk
The error information term is calculated as :
δj = δinj + zinj
Updation of weight and bias :
Step 8: Each output unit y k (k=1 to m) updates its bias and weight (j=1 to a). The
weight correction term is given by :
Δ wjk = α δk zj
and the bias correction term is given by Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each hidden unit z j (j=1 to a) update its bias and weights (i=0 to n) the
weight connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j
Step 9: Test the stopping condition. The stopping condition can be the minimization
of error, number of epochs.
Need for Backpropagation:

Backpropagation is “backpropagation of errors” and is very useful for training

neural networks. It’s fast, easy to implement, and simple. Backpropagation does not
require any parameters to be set, except the number of inputs. Backpropagation is a
flexible method because no prior knowledge of the network is required.

Types of Backpropagation

There are two types of backpropagation networks.

 Static backpropagation: Static backpropagation is a network designed to map
static inputs for static outputs. These types of networks are capable of solving
static classification problems such as OCR (Optical Character Recognition).
 Recurrent backpropagation: Recursive backpropagation is another network
used for fixed-point learning. Activation in recurrent backpropagation is feed-
forward until a fixed value is reached. Static backpropagation provides an instant
mapping, while recurrent backpropagation does not provide an instant mapping.

Advantages:

 It is simple, fast, and easy to program.

 Only numbers of the input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.

Disadvantages:

 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate
results.
 Performance is highly dependent on input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.

Empirical Risk Minimization:-

Empirical risk minimization (ERM) is a principle in statistical learning

theory which defines a family of learning algorithms and is used to give theoretical
bounds on their performance. The core idea is that we cannot know exactly how well
an algorithm will work in practice (the true "risk") because we don't know the true
distribution of data that the algorithm will work on, but we can instead measure its
performance on a known set of training data (the "empirical" risk).

Lecture Notes 3 &4
No ratings yet
Lecture Notes 3 &4
35 pages
Upload Unit 2
No ratings yet
Upload Unit 2
19 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
Unit 1
No ratings yet
Unit 1
72 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
Multi Percept Ron
No ratings yet
Multi Percept Ron
14 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
AI Unit II Lec Notes Deep Learning
No ratings yet
AI Unit II Lec Notes Deep Learning
64 pages
ML3 Unit 4-3
No ratings yet
ML3 Unit 4-3
13 pages
ML Unit-2
100% (1)
ML Unit-2
28 pages
DLA Unit 3
No ratings yet
DLA Unit 3
26 pages
UNIT-2 Machine Learning
No ratings yet
UNIT-2 Machine Learning
35 pages
Gradient Descent in Machine Learning - Javatpoint
No ratings yet
Gradient Descent in Machine Learning - Javatpoint
9 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
31 pages
AI33
No ratings yet
AI33
6 pages
04 NeuralNetworksII
No ratings yet
04 NeuralNetworksII
74 pages
Module 2 Notes - Full
No ratings yet
Module 2 Notes - Full
54 pages
1 Intro
No ratings yet
1 Intro
91 pages
Neural Networks: MLP and Activation Functions
No ratings yet
Neural Networks: MLP and Activation Functions
5 pages
Unit 5
No ratings yet
Unit 5
32 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
UNIT2
No ratings yet
UNIT2
25 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Computing Gradient Using Backpropagation: ZV0GDF798E
No ratings yet
Computing Gradient Using Backpropagation: ZV0GDF798E
5 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Machine Learning 45 A 87
No ratings yet
Machine Learning 45 A 87
43 pages
Deep Learning Interview
No ratings yet
Deep Learning Interview
28 pages
Linearity: Skip To Content
No ratings yet
Linearity: Skip To Content
10 pages
ML Unit 2 Lecture Notes
No ratings yet
ML Unit 2 Lecture Notes
20 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Backpropagation, Sgmiod Neuron & Gradient Discend
No ratings yet
Backpropagation, Sgmiod Neuron & Gradient Discend
29 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Unit 4 ML NN, DL, CNN-1
No ratings yet
Unit 4 ML NN, DL, CNN-1
84 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
ML Unit 3 1
No ratings yet
ML Unit 3 1
57 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
DL Module 2 1 (Sami)
No ratings yet
DL Module 2 1 (Sami)
17 pages
Gradient Descent for Beginners
No ratings yet
Gradient Descent for Beginners
15 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Complete Deep Learning Interview Question
No ratings yet
Complete Deep Learning Interview Question
46 pages
Amlata2020 044
No ratings yet
Amlata2020 044
11 pages
214 Fractalnet Ultra Deep Neural N
No ratings yet
214 Fractalnet Ultra Deep Neural N
11 pages
DLT Mid-2 QP r20
No ratings yet
DLT Mid-2 QP r20
4 pages
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
No ratings yet
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
20 pages
Deep Learning for Short-Term Load Forecast
No ratings yet
Deep Learning for Short-Term Load Forecast
26 pages
Applications of AI
No ratings yet
Applications of AI
56 pages
Photonic Computing for Researchers
No ratings yet
Photonic Computing for Researchers
32 pages
Fundamental - Deep Learning
No ratings yet
Fundamental - Deep Learning
69 pages
MATLAB Neural Network Examples
No ratings yet
MATLAB Neural Network Examples
3 pages
Deep Learning Q Bank Mte
No ratings yet
Deep Learning Q Bank Mte
2 pages
Deep Learning with PyTorch Course
No ratings yet
Deep Learning with PyTorch Course
9 pages
Unit - 5 Deep Learning
No ratings yet
Unit - 5 Deep Learning
15 pages
2003.00130 - James Wallbridge - Transformers For Limit Order Books
No ratings yet
2003.00130 - James Wallbridge - Transformers For Limit Order Books
16 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
5 pages
Week 11
No ratings yet
Week 11
3 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
401 pages
Modern Deep Learning Foundation by Barak or
No ratings yet
Modern Deep Learning Foundation by Barak or
144 pages
CNNs: A Guide for Tech Enthusiasts
No ratings yet
CNNs: A Guide for Tech Enthusiasts
80 pages
Neural Network Architecture
No ratings yet
Neural Network Architecture
3 pages
CAM++: A Fast and Efficient Network For Speaker Verification Using Context-Aware Masking
No ratings yet
CAM++: A Fast and Efficient Network For Speaker Verification Using Context-Aware Masking
5 pages
DL CS05
No ratings yet
DL CS05
22 pages
Architecture: Simple Neural Nets For Pattern Classification
No ratings yet
Architecture: Simple Neural Nets For Pattern Classification
15 pages
Laporan M4
No ratings yet
Laporan M4
9 pages
Introduction To ANN Set 4 (Network Architectures)
No ratings yet
Introduction To ANN Set 4 (Network Architectures)
6 pages
Activation Functions and Initialization Methods
No ratings yet
Activation Functions and Initialization Methods
17 pages
LSTM Architecture Presentation
No ratings yet
LSTM Architecture Presentation
18 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
17 pages
Fundamentals of Deep Learning: Part 5: Pre-Trained Models
No ratings yet
Fundamentals of Deep Learning: Part 5: Pre-Trained Models
18 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
51 pages

DL Unit 2a

Uploaded by

DL Unit 2a

Uploaded by

UNIT-2

Feed forward Networks: Multilayer Perceptron, Gradient Descent, Back

Multi layer perceptron (MLP) is a supplement of feed forward neural network. It

The algorithm for the MLP is as follows:

In the representation below:

 ai(in) refers to the ith value in the input layer

In the following topics, let us look at the forward propagation in detail.

MLP Learning Procedure

The MLP learning procedure is as follows:

Forward Propagation in MLP

It allows nonlinearity needed to solve complex problems like image processing.

The sigmoid curve is an S-shaped curve.

Activation of Hidden Layer

The activation of the hidden layer is represented as:

For the output layer:

Z(out) = A(h) W(out)

o If we move towards a negative gradient or away from the gradient of the

o Calculates the first-order derivative of the function to compute the gradient or

The cost function is defined as the measurement of difference or error between

The starting point(shown in above fig.) is used to evaluate the performance as it is

o Direction & Learning Rate

Types of Gradient Descent

Advantages of Batch gradient descent:

o It produces less noise in comparison to other gradient descent.

2. Stochastic gradient descent

Advantages of Stochastic gradient descent:

In Stochastic gradient descent (SGD), learning happens on every example, and it

o It is easier to allocate in desired memory.

3. MiniBatch Gradient Descent:

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.

Back Propagation Algorithm:-

Backpropagation, or backward propagation of errors, is an algorithm that is designed

Backpropagation is a widely used algorithm for training feedforward neural

Step 1: Inputs X, arrive through the preconnected path.

 x = inputs training vector x=(x 1,x2,…………xn).

Backpropagation is “backpropagation of errors” and is very useful for training

There are two types of backpropagation networks.

 It is simple, fast, and easy to program.

Empirical Risk Minimization:-

Empirical risk minimization (ERM) is a principle in statistical learning

You might also like