DLA Unit 3

This document provides an overview of deep learning, focusing on artificial neural networks, specifically the perceptron and its algorithm, as well as gradient descent and its variants. It discusses the development of perceptrons, their functioning, advantages and disadvantages of multi-layer perceptrons, and the optimization process through gradient descent. Additionally, it covers activation functions, cost functions, and types of gradient descent including batch and stochastic methods.

Uploaded by

terala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

DLA Unit 3

Uploaded by

terala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

DEEP LEARNING AND APPLICATIONS

MR20-1CS0158
UNIT III

Prepared by
Dr.M.Narayanan
Professor
Department of CSE
Malla Reddy University, Hyderabad
DEEP LEARNING AND APPLICATIONS
MR20-1CS0158
UNIT III
Artificial Neural Networks: Introduction, Perceptron Training Rule, Gradient Descent Rule.
Gradient Descent and Back propagation: Gradient Descent, Stochastic Gradient Descent,
Back propagation, Some problems in ANN Optimization and Regularization: Over fitting and
Capacity, Cross Validation, Feature Selection, Regularization, Hyper parameters.

Text Book
1. Goodfellow, I., Bengio, Y., and Courville, A., Deep Learning, MIT Press, 2016
What is Perceptron and how it has been developed?
 The word Perceptron is first ever coined in 1943 by Warren McCulloch and Walter
Pitts. Lately, Frank Rosenblatt, an American psychologist, and computer scientist built
the first Perceptron machine when doing research at Cornell Aeronautical Laboratory
for performing image recognition.
 During this time the Perceptron machine become popular among the AI community and
is considered the fundamental part of building intelligent systems.
 The perceptron algorithm is based on the concept of a single neuron in the human brain,
a single neuron in the human brain is doing a very simple thing, just receiving some
inputs and if the inputs are high, it activates and send the signal to the next neuron.
 The perceptron was designed to mimic (copycat) this process, with the input data
serving as the input to the neuron and the weights representing the strength of the
connections between the input neurons and the output neuron.
The Perceptron Algorithm, How does it work?
 The Perceptron is a type of linear classifier (binary classifier), which means it can be used to
classify data that is linearly separable.
 A Perceptron is somehow similar to Logistic Regression at first glance, but it's different.
While Logistic Regression predicts probabilities of a data point falling in a particular class,
Perceptron will only tell whether the data point is in a particular class or not, Just like saying
"Yes" or "No". Here is the diagrammatic representation of the perceptron algorithm.
 A Perceptron is a kind of single Artificial Neuron which is also known as a Threshold
Processing Unit(TLU).
 As you can see in the above diagram, the Perceptron contains some input links X1, X2,
and, X3.
 Each input has its own corresponding weights W1, W2, and W3. These weights are
basically the hearts of Perceptrons which determine the strength of each input signal to
it.
 The Perceptron or TLU computes the weighted sum of the inputs (z = X1W1 + X2W2 +
X3W3 ...... + XnWn) and then these weighted sums of inputs are passed through an
activation function also known as the step function.
 These activation functions will determine whether the Perceptron needs to be activated
or not.
 Let's see an example to understand this,
 In the above example, we can see that there are three inputs given into a Perceptron, and
then the weighted sum of inputs is calculated and we got 0.22. This is passed through an
activation called the Heaviside activation function
 But you may be noticed that one of the inputs to the perceptron is zero, this might not be
good sometimes since it will affect the training process.
 If you try to change the weights, it will not make any effect since the input is still zero.
Here we need to add a new term to the equation which is known as bias.
 Bias will help to shift the activation to the left or right during the training of the
Perceptron algorithm.
 So the new term obtained will look like this,

z = (XW + bias)
Activation functions
 Activation functions are mathematical functions that can be used in Perceptrons to
determine the output given its input.
 As we said it determines whether the neuron(Perceptron) needs to be activated or not.
 Activation functions take in a weighted sum of the input data, called the activation, and
produce an output that can be used for prediction.
 Activation functions are an essential part of Perceptrons and neural networks because
they allow the model to learn and make decisions based on the input data.
 They also help to introduce non-linearity into the model, which is necessary for learning
more complex relationships in the data.
 Some common types of activation functions used in Perceptrons are the Sign function,
Heaviside function, Sigmoid function, ReLU function, etc.
 Here Heaviside and Sign functions are commonly used with Perceptrons, so let's
understand what these activation functions do,
Heaviside function
 The Heaviside activation function will return 0 when the weighted sum of inputs is less
than zero and return 1 if it is greater than or equal to 0.]

Sign function
 The Sign function will return 0 if the weighted sum of inputs is 0 and return +1 and -1
when the weighted sum of inputs is greater and lesser than 0 respectively.
Advantages of Multi-Layer Perceptron:
 A multi-layered perceptron model can be used to solve complex non-linear problems.
 It works well with both small and large input data.
 It helps us to obtain quick predictions after the training.
 It helps to obtain the same accuracy ratio with large as well as small data.
Disadvantages of Multi-Layer Perceptron:
 In Multi-layer perceptron, computations are difficult and time-consuming.
 In multi-layer Perceptron, it is difficult to predict how much the dependent variable
affects each independent variable.
 The model functioning depends on the quality of the training.
Question Bank

1. What is Perceptron and how it has been developed?

2. Explain Perceptron Algorithm, How does it work?
3. Write a Python program to convert video into frames using Hyper parameter Tuning
4. Explain Gradient Descent and Back propagation with suitable example.
5. Explain Stochastic Gradient Descent Algorithm
Gradient Descent
 Gradient Descent is known as one of the most commonly used optimization algorithms to
train machine learning models by means of minimizing errors between actual and
expected results.
 Further, gradient descent is also used to train Neural Networks.
 In mathematical terminology, Optimization algorithm refers to the task of
minimizing/maximizing an objective function f(x) parameterized by x.
 Similarly, in machine learning, optimization is the task of minimizing the cost function
parameterized by the model's parameters.
 The main objective of gradient descent is to minimize the convex function using iteration
of parameter updates.
 Once these machine learning models are optimized, these models can be used as powerful
tools for Artificial Intelligence and various computer science applications
What is Gradient Descent or Steepest Descent?
 Gradient descent was initially discovered by "Augustin-Louis Cauchy" in mid of 18th
century.
 Gradient Descent is defined as one of the most commonly used iterative optimization
algorithms of machine learning to train the machine learning and deep learning models.
 It helps in finding the local minimum of a function.
 If we move towards a negative gradient or away from the gradient of the function at the
current point, it will give the local minimum of that function.
 Whenever we move towards a positive gradient or towards the gradient of the function at
the current point, we will get the local maximum of that function.
 This entire procedure is known as Gradient Ascent, which is also known as steepest
descent.
 The main objective of using a gradient descent algorithm is to minimize the cost function
using iteration. To achieve this goal, it performs two steps iteratively:
 Calculates the first-order derivative of the function to compute the gradient or slope
of that function.
 Move away from the direction of the gradient, which means slope increased from the
current point by alpha times, where Alpha is defined as Learning Rate.
 It is a tuning parameter in the optimization process which helps to decide the length of the
steps.
What is Cost-function?
 The cost function is defined as the measurement of difference or error between actual
values and expected values at the current position and present in the form of a single real
number.
 It helps to increase and improve machine learning efficiency by providing feedback to this
model so that it can minimize error and find the local or global minimum
 Further, it continuously iterates along the direction of the negative gradient until the cost
function approaches zero.
 At this steepest descent point, the model will stop learning further. Although cost function
and loss function are considered synonymous, also there is a minor difference between
them.
 The slight difference between the loss function and the cost function is about the error
within the training of machine learning models, as loss function refers to the error of one
training example, while a cost function calculates the average error across an entire
training set.
 The cost function is calculated after making a hypothesis with initial parameters and
modifying these parameters using gradient descent algorithms over known data to reduce
the cost function.
How does Gradient Descent work?
 Before starting the working principle of gradient descent, we should know some basic
concepts to find out the slope of a line from linear regression. The equation for simple
linear regression is given as:
Y=mX+c

 Where 'm' represents the slope of the line,

and 'c' represents the intercepts on the y-axis.
 The starting point(shown in above fig.) is used to evaluate the performance as it is
considered just as an arbitrary point.
 At this starting point, we will derive the first derivative or slope and then use a tangent line
to calculate the steepness of this slope. Further, this slope will inform the updates to the
parameters (weights and bias).
 The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point, it
approaches the lowest point, which is called a point of convergence.
 The main objective of gradient descent is to minimize the cost function or the error
between expected and actual.
 To minimize the cost function, two data points are required:
Direction & Learning Rate
 These two factors are used to determine the partial derivative calculation of future iteration
and allow it to the point of convergence or local minimum or global minimum.
Let's discuss learning rate factors in brief;
Learning Rate:
 It is defined as the step size taken to reach the minimum or lowest point.
 This is typically a small value that is evaluated and updated based on the behavior of the
cost function.
 If the learning rate is high, it results in larger steps but also leads to risks of overshooting
the minimum.
 At the same time, a low learning rate shows the small step sizes, which compromises
overall efficiency but gives the advantage of more precision.
Types of Gradient Descent
 Based on the error in various training models, the Gradient Descent learning algorithm can
be divided into Batch gradient descent, stochastic gradient descent, and mini-batch
gradient descent.
 Let's understand these different types of gradient descent:
 Batch Gradient Descent:
 Batch gradient descent (BGD) is used to find the error for each point in the training set and
update the model after evaluating all training examples.
 This procedure is known as the training epoch. In simple words, it is a greedy approach
where we have to sum over all examples for each update.
Advantages of Batch gradient descent:
 It produces less noise in comparison to other gradient descent.
 It produces stable gradient descent convergence.
 It is Computationally efficient as all resources are used for all training samples.
Stochastic gradient descent
 Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration.
 Or in other words, it processes a training epoch for each example within a dataset and
updates each training example's parameters one at a time.
 As it requires only one training example at a time, hence it is easier to store in allocated
memory.
 However, it shows some computational efficiency losses in comparison to batch gradient
systems as it shows frequent updates that require more detail and speed.
 Further, due to frequent updates, it is also treated as a noisy gradient.
 However, sometimes it can be helpful in finding the global minimum and also escaping the
local minimum.
Advantages of Stochastic gradient descent:
 In Stochastic gradient descent (SGD), learning happens on every example, and it consists
of a few advantages over other gradient descent.
 It is easier to allocate in desired memory.
 It is relatively fast to compute than batch gradient descent.
 It is more efficient for large datasets.
Here's a simplified step-by-step explanation of Stochastic Gradient Descent:

 Initialization: Initialize the model parameters randomly.

 Data Shuffling: Shuffle the training dataset to ensure that the optimization process
encounters a diverse range of data points in each iteration.
 Iterations: For each iteration, randomly select a small batch (subset) of data from the
shuffled dataset.
 Compute Gradient: Compute the gradient of the loss function with respect to the model
parameters using the selected batch.
 Update Parameters: Update the model parameters using the computed gradient. The
update is performed in the opposite direction of the gradient to minimize the loss.
 Repeat: Repeat steps 3-5 until a predefined number of iterations or a convergence criterion
is met
Learning gives Creativity,
Creativity leads to Thinking,
Thinking provides Knowledge,
and Knowledge makes you
great
A. P. J. Abdul Kalam

DL Lecture 04 05 Neural Network
No ratings yet
DL Lecture 04 05 Neural Network
51 pages
Neural Networks (Basics)
No ratings yet
Neural Networks (Basics)
30 pages
Unit II
No ratings yet
Unit II
33 pages
DL Unit-1 San
No ratings yet
DL Unit-1 San
58 pages
Introd 03
No ratings yet
Introd 03
61 pages
DL Unit 2a
No ratings yet
DL Unit 2a
14 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
31 pages
Handwritten Notes - Unit 1,2
No ratings yet
Handwritten Notes - Unit 1,2
9 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Unit 5
No ratings yet
Unit 5
32 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
Unit V - Aiml PDF
No ratings yet
Unit V - Aiml PDF
29 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
Unit V
No ratings yet
Unit V
25 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
Linear Discriminant Analysis & Machine Learning
No ratings yet
Linear Discriminant Analysis & Machine Learning
41 pages
M03 Networks
No ratings yet
M03 Networks
40 pages
Unit 2
No ratings yet
Unit 2
20 pages
Top Deep Learning Interview Questions You Must Know-2
No ratings yet
Top Deep Learning Interview Questions You Must Know-2
8 pages
Unit 4
No ratings yet
Unit 4
18 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Unit 1
No ratings yet
Unit 1
72 pages
Neural Network
No ratings yet
Neural Network
82 pages
Perceptrons and NN
No ratings yet
Perceptrons and NN
29 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Upload Unit 2
No ratings yet
Upload Unit 2
19 pages
Unit V
No ratings yet
Unit V
26 pages
ch6 (Q 2,8,4)
No ratings yet
ch6 (Q 2,8,4)
9 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Multi Percept Ron
No ratings yet
Multi Percept Ron
14 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Perceptron and Optimization Techniques
No ratings yet
Perceptron and Optimization Techniques
40 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
Advanced Machine Learning CIE
No ratings yet
Advanced Machine Learning CIE
13 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
ML 2
No ratings yet
ML 2
10 pages
AI Unit II Lec Notes Deep Learning
No ratings yet
AI Unit II Lec Notes Deep Learning
64 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
NN Unit 2
No ratings yet
NN Unit 2
20 pages
NNDL
No ratings yet
NNDL
96 pages
S02 DNN Perceptron Wip
No ratings yet
S02 DNN Perceptron Wip
24 pages
NN Theory
No ratings yet
NN Theory
138 pages
ANN Unit-2 Chapter-2
No ratings yet
ANN Unit-2 Chapter-2
56 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Understanding Neural Network Math
No ratings yet
Understanding Neural Network Math
6 pages
Slide 2
No ratings yet
Slide 2
35 pages
ML3 Unit 4-3
No ratings yet
ML3 Unit 4-3
13 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Supervised Learning: Linear Models
No ratings yet
Supervised Learning: Linear Models
34 pages
Algorithm & Solved Example - ADALINE
No ratings yet
Algorithm & Solved Example - ADALINE
5 pages
Ann Mid1: Artificial Neural Networks With Biological Neural Network - Similarity
No ratings yet
Ann Mid1: Artificial Neural Networks With Biological Neural Network - Similarity
13 pages
SC - M1 - Ktunotes - in
No ratings yet
SC - M1 - Ktunotes - in
190 pages
007 Perceptron Complete
No ratings yet
007 Perceptron Complete
35 pages
Neural Networks & SVMs in AI
No ratings yet
Neural Networks & SVMs in AI
19 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
A Presentation On: By: Edutechlearners
No ratings yet
A Presentation On: By: Edutechlearners
33 pages
Applications of Artificial Neural Networks in Voice Recognition and Nettalk PDF
No ratings yet
Applications of Artificial Neural Networks in Voice Recognition and Nettalk PDF
33 pages
Neural Network and Fuzzy Logic
50% (2)
Neural Network and Fuzzy Logic
54 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
No ratings yet
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
4 pages
Ai Model Question Paper-3
No ratings yet
Ai Model Question Paper-3
27 pages
Induction Motor Vector Control Guide
No ratings yet
Induction Motor Vector Control Guide
67 pages
Analysis of Steel-Encased Composite Columns Subjected To Concentric Axial Load Using Artificial Neural Networks
No ratings yet
Analysis of Steel-Encased Composite Columns Subjected To Concentric Axial Load Using Artificial Neural Networks
6 pages
Machine Learning For Condensed Matter Physics: Review Article
No ratings yet
Machine Learning For Condensed Matter Physics: Review Article
48 pages
Multilayer Neural Network
No ratings yet
Multilayer Neural Network
27 pages
Financial Forecasting: Comparison of ARIMA, FFNN and SVR Models
No ratings yet
Financial Forecasting: Comparison of ARIMA, FFNN and SVR Models
100 pages
Algorithm Unrolling Interpretable Efficient Deep Learning For Signal and Image Processing
No ratings yet
Algorithm Unrolling Interpretable Efficient Deep Learning For Signal and Image Processing
27 pages
Unit 5
No ratings yet
Unit 5
25 pages
Efficient Software Cost Estimation Using Machine Learning Techniques
No ratings yet
Efficient Software Cost Estimation Using Machine Learning Techniques
20 pages
AI Expert Systems Overview
No ratings yet
AI Expert Systems Overview
39 pages
Real-Time Video Surveillance Based Structural Health Monitoring of Civil Structures Using Artificial Neural Network
No ratings yet
Real-Time Video Surveillance Based Structural Health Monitoring of Civil Structures Using Artificial Neural Network
16 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
LSTM Tutorial for AI Beginners
No ratings yet
LSTM Tutorial for AI Beginners
34 pages
Plant Monitoring and Leaf Disease Detection With Classification Using Machine Learning Matlab IJERTCONV8IS12004
No ratings yet
Plant Monitoring and Leaf Disease Detection With Classification Using Machine Learning Matlab IJERTCONV8IS12004
4 pages
Chat GPT For Professional English Course Developme
No ratings yet
Chat GPT For Professional English Course Developme
14 pages
Unit 1
No ratings yet
Unit 1
25 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
1 s2.0 S1359836823006029 Main
No ratings yet
1 s2.0 S1359836823006029 Main
16 pages
Neural Networks in Demand Forecasting
No ratings yet
Neural Networks in Demand Forecasting
9 pages
Lect 5
No ratings yet
Lect 5
17 pages
Artificial Intelligence in Medicine Book - 2022 - 2
No ratings yet
Artificial Intelligence in Medicine Book - 2022 - 2
15 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
2 pages
Optimizers: Lion vs Adam
No ratings yet
Optimizers: Lion vs Adam
2 pages
Gas Turbines Modeling Simulation and Control Using Artificial Neural Networks
100% (1)
Gas Turbines Modeling Simulation and Control Using Artificial Neural Networks
218 pages

DLA Unit 3

Uploaded by

DLA Unit 3

Uploaded by

DEEP LEARNING AND APPLICATIONS

1. What is Perceptron and how it has been developed?

 Where 'm' represents the slope of the line,

 Initialization: Initialize the model parameters randomly.

You might also like