UNIT 1 Introduction Part 1

The document provides an overview of deep learning, covering key concepts such as feedforward neural networks, gradient descent, backpropagation, and various neural network architectures including CNNs, RNNs, and GANs. It discusses the challenges of unit saturation and the vanishing gradient problem, along with techniques to mitigate these issues. Additionally, it illustrates the learning process through examples like the XOR function and the application of gradient descent in training neural networks.

Uploaded by

sanashashikanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views37 pages

UNIT 1 Introduction Part 1

Uploaded by

sanashashikanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 37

UNIT- I

• Introduction
• Feed forward Neural networks
• Gradient descent and the back propagation algorithm
• Unit saturation
• the vanishing gradient problem
• and ways to mitigate it.
• RelU Heuristics for avoiding bad local minima Heuristics
for faster training
• Nestors accelerated gradient descent Regularization
• Dropout
Introduction to Deep Learning
• Deep learning is a subfield of artificial
intelligence (AI) and machine learning that
focuses on training artificial neural
networks to perform tasks that typically
require human intelligence.
• It has gained widespread attention and
made significant advancements in various
applications, including image recognition,
natural language processing, speech
recognition, and more.
Here are some common types of deep
learning:
Feedforward Neural Convolutional Neural
Networks (FNNs): Networks (CNNs):
• These are the fundamental • CNNs are designed for
building blocks of deep processing grid-like data,
learning. FNNs consist of such as images and videos.
an input layer, one or more • They use convolutional
hidden layers, and an layers to automatically
output layer. learn features from local
• Each layer contains nodes regions of the input,
(neurons) that process and making them highly
transform the data. effective in tasks like
• FNNs are used for various image classification, object
tasks, including regression detection, and image
and classification. segmentation.
Common types of deep learning
(contd..)
Recurrent Neural Networks Long Short-Term Memory
(RNNs): (LSTM)
• RNNs are designed for • LSTMs are a type of RNN
sequential data, such as architecture designed to
time series, text, and capture long-range
speech. They have dependencies in sequential
feedback connections, data more effectively.
allowing them to maintain a • They use specialized
memory of previous inputs. memory cells to store and
• RNNs are suitable for tasks update information over
like natural language longer sequences, making
processing (NLP), machine them suitable for tasks
translation, and speech requiring understanding of
recognition. context over time.
Common types of deep learning
(contd..)
Gated Recurrent Unit (GRU): Autoencoders:
• GRUs are another variant • Autoencoders are neural
of RNNs that address the networks used for
vanishing gradient unsupervised learning and
dimensionality reduction.
problem, like LSTMs.
• They consist of an encoder that
• They are computationally maps input data to a lower-
more efficient and often dimensional representation
used for similar sequence- (encoding) and a decoder that
based tasks in NLP and reconstructs the original data
speech recognition. from this encoding.
• Autoencoders are used in
applications like image
denoising and anomaly
detection.
Common types of deep learning
(contd..)
Generative Adversarial
Transformer Models:
Networks (GANs): • Transformers have
• GANs consist of two neural revolutionized natural language
networks, a generator and a processing (NLP) and have
discriminator, that compete been adapted to various other
against each other. domains.
• They use a self-attention
• The generator tries to create
mechanism to process input
data that is indistinguishable
data in parallel, making them
from real data, while the highly scalable and effective for
discriminator tries to tell real sequence-to-sequence tasks.
from fake. • Notable transformer-based
• GANs are used for tasks like models include BERT, GPT
image generation, style (Generative Pre-trained
transfer, and data Transformer), and T5.
augmentation.
Common types of deep learning
(contd..)
Capsule Networks
Siamese Networks: (CapsNets):
• These networks are • CapsNets are designed to
designed for tasks improve the shortcomings
involving similarity or of traditional CNNs,
distance measurement especially in handling pose
between pairs of inputs. variations and hierarchical
• Siamese networks have features in images.
two identical subnetworks • They use capsules instead
that process each input of neurons to represent
and produce embeddings different parts of an object.
that can be compared to
measure similarity or
dissimilarity.
Feed forward Neural networks
• Deep feedforward networks, also called feedforward neural
networks, or multilayer perceptrons (MLPs), are the
quintessential deep learning models.
• The goal of a feedforward network is to approximate some
function f∗.
• For example, for a classifier, y=f∗(x) maps an input x to a
category y.
• A feedforward network defines a mapping y=f(x;θ) and learns
the value of the parameters θ that result in the best function
approximation.
These models are called feedforward because information flows through
the function being evaluated from x, through the intermediate
computations used to define f, and finally to the output y. There are no
feedback connections in which outputs of the model are fed back into
itself. When feedforward neural networks are extended to include
feedback connections, they are called recurrent neural networks
Feed forward Neural networks
•
(Contd.)
Feedforward neural networks are often referred to as "networks"
because they are constructed by combining multiple functions.
• These networks are represented by a directed acyclic graph that
illustrates how these functions are interconnected.
• Typically, they are organized in a sequential manner, with functions
like f(1), f(2), and f(3) linked together in a chain, forming an overall
function f(x) = f(3)(f(2)(f(1)(x))).
• These chain-like structures are the most common configuration for
neural networks. In this context, each function, such as f (1), f(2), etc.,
is termed a layer of the network, with f(1) being the first layer, f(2) the
second layer, and so forth. They form the hidden layers.
• The overall length of the chain gives the depth of the model. The name “deep
learning” arose from this terminology. The final layer of a feedforward network is
called the output layer.
• Feedforward networks use the activation functions to compute the hidden layer
values.
Example: Learning XOR
• An example of a fully functioning feedforward network on a
very simple task: learning the XOR function.
• The XOR function (“exclusive or”) is an operation on two
binary values, x1andx2.
• When exactly one of these binary values is equal to 1, the
XOR function returns 1. Otherwise, it returns 0.
• The XOR function provides the target function y=f∗(x) that
we want to learn. Our model provides a function y=f(x;θ),
and our learning algorithm will adapt the parameters θ to
make f as similar as possible to f∗
We want our network to perform correctly on the four
points X = {[0, 0], [0,1],[1,0], and [1,1]}.
We will train the network on all four of these points.
The only challenge is to fit the training set.
Evaluated on our whole training set, the MSE loss function
is a linear model, with θ consisting of w and b.
Our model is defined to be
f (x; w, b) = x T w + b.
Evaluated on our whole training set, the MSE loss
function is
To finish computing the value of h for each example, we apply the
rectified linear transformation: In this space, all the examples lie
along a line with slope 1. As we move along this line, the output
needs to begin at 0, then rise to 1, then drop back down to 0. A linear
model cannot implement such a function.
GRADIENT DESCENT & BACK
PROPAGATION
• Gradient descent and the backpropagation
algorithm are fundamental techniques used in
training artificial neural networks for various
machine learning tasks, including image
recognition, natural language processing, and more.
• Gradient Descent:
• Gradient descent is an optimization algorithm used
to minimize a loss function by adjusting the
parameters (weights and biases) of a machine
learning model iteratively. The idea is to find the set
of parameters that minimizes the error between the
model's predictions and the actual target values.
Here's a simple example of
gradient descent with a linear
regression model:
• Objective: Minimize the mean squared error (MSE)
loss for a linear regression model.
• Linear Regression Model: The model has a single
parameter, a weight (w), and a bias (b). It predicts an
output (y_pred) given an input (x) as follows:
• y_pred = w * x + b
• Loss Function: The MSE loss for linear regression is
defined as:
• MSE = (1/n) * Σ(y_i - y_pred_i)^2
• Where:
• n is the number of data points.
• y_i is the actual target for the i-th data point.
• y_pred_i is the predicted output for the i-th data point.
Gradient Descent
Algorithm:
1. Initialize w and b with random values.
2. Choose a learning rate (α). Which is used to scale
the magnitude of parameter updates during
gradient descent.
3. Repeat until convergence:
1.Calculate the gradient of the loss with respect to w and b.
2.Update w and b using the gradient and learning rate:
3.w = w - α * ∂(MSE)/∂w
4.b = b - α * ∂(MSE)/∂b
5.Repeat the above steps until the loss converges to a
minimum value.
• a
A simple example of gradient
descent using a one-
dimensional function.
• Suppose we want to minimize the
following quadratic function:
• f(x) = x^2
• The goal is to find the minimum
value of this function using gradient
descent.
GD
• The gradient is:
• ∂f/∂x = 2x
• Update x using the gradient and the learning rate:
• x = x - α * ∂f/∂x
1.Repeat steps 2 and 3 for a specified
number of iterations or until
convergence.
• Let's perform a few iterations of gradient
descent:
As you can see, with each
iteration, x gets closer to 0,
which is the minimum of the
function.
This process continues until the
convergence criteria are met or a
specified number of iterations are
reached.
In practice, gradient descent is
used to optimize more complex
functions with high-dimensional
parameter spaces, such as
training neural networks in deep
learning.
Back Propagation Algorithm
• Backpropagation is a fundamental
algorithm used for training artificial
neural networks, particularly feedforward
neural networks with multiple layers (also
known as deep neural networks).
• It enables the network to learn from data
by iteratively adjusting its parameters
(weights and biases) to minimize a
predefined loss or error function.
Key Concepts in
Backpropagation:
1. Feedforward Pass: In the feedforward pass, input data is
propagated through the network layer by layer, resulting in
an output prediction. Each neuron in a layer calculates a
weighted sum of its inputs, applies an activation function,
and passes the result to the next layer.
2. Loss Function: A loss function (also known as a cost
function) quantifies the error between the network's
predictions and the actual target values. Common loss
functions include mean squared error (MSE) for regression
tasks and cross-entropy for classification tasks.
3. Backpropagation of Error: After the feedforward pass,
the network computes the gradient of the loss with respect
to its parameters (weights and biases) using the chain rule
from calculus. This gradient information is then used to
update the parameters during the optimization process.
• 4. Gradient Descent: The
optimization algorithm (usually
gradient descent or its variants)
adjusts the network's parameters in
the opposite direction of the
gradient to minimize the loss. The
learning rate determines the step
size for each parameter update.
Example of
Backpropagation:
• Let's consider training a feedforward neural
network for binary classification. The network
has one hidden layer with two neurons and an
output layer with a single neuron. We'll use a
simple dataset of two-dimensional points (x1,
x2) and binary labels (0 or 1) for the example.
The network's architecture is as follows:
• Input layer: 2 neurons (corresponding to x1
and x2)
• Hidden layer: 2 neurons (with sigmoid
activation)
• Output layer: 1 neuron (with sigmoid
activation)
Steps in
Backpropagation:
• Forward Pass:
• Input (x1, x2) is fed into the network.
• Calculate the weighted sum and
apply the sigmoid activation in the
hidden layer.
• Calculate the weighted sum and
apply the sigmoid activation in the
output layer.
1. Loss Calculation:
1. Compute the loss (e.g., cross-entropy) between the predicted
output and the actual target label.
2. Backpropagation:
1. Calculate the gradient of the loss with respect to the output layer's
weighted sum and biases.
2. Backpropagate this gradient to the hidden layer and compute
gradients for its parameters.
3. Use these gradients to update the weights and biases in both layers
using gradient descent.
• Repeat:
• Repeat the above steps for a batch of training examples
(mini-batch) and iterate through the entire dataset for
multiple epochs.
Here's a simplified example
of a single training iteration:
• Forward Pass:
• Input (x1, x2) = (1.0, 0.5)
• Hidden layer:
• Weighted sum: z1 = w1 * x1 + w2 * x2 + b1
• Activation: a1 = sigmoid(z1)
• Similar calculations for neuron 2 in the hidden layer.
• Output layer:
• Weighted sum: z2 = w3 * a1 + w4 * a2 + b2
• Activation: a2 = sigmoid(z2)
• Loss Calculation:
• Calculate the cross-entropy loss between the
predicted output a2 and the actual label (0 or
1).
• Backpropagation:
• Compute gradients for output layer parameters (e.g.,
w3, w4, b2).
• Propagate gradients backward to the hidden layer,
compute gradients for its parameters (e.g., w1, w2, b1).
• Update all weights and biases using gradient descent.

• This process is repeated for multiple training

iterations until the network's parameters
converge, and the loss reaches a satisfactory
minimum.
UNIT SATURATION
• Unit saturation, also known as saturation of a neural unit, is
a phenomenon that occurs when the activation function of a
neuron reaches extreme values, typically 0 or 1, and
remains there for most input values.
• In other words, the neuron saturates when its input is either
very large (positive or negative) or very close to zero,
causing the output of the neuron to become insensitive to
further changes in input.
• This can pose problems during training because the
gradients with respect to the weights may become very
small, leading to slow convergence or vanishing gradients.
• Unit saturation is often associated with activation functions
like sigmoid and hyperbolic tangent(tanh)
• Sigmoid Activation Function: The sigmoid
function is defined as follows:
• σ(x) = 1 / (1 + exp(-x))
• When x is very large (positive or negative), σ(x) approaches 1 or
0, respectively.
• When x is close to 0, σ(x) is approximately 0.5.
• Example of Unit Saturation:
• Consider a neural network with a sigmoid activation function
and a weight (w) connected to a neuron. Let's say that during
training, the network encounters an input value (x) of 10 for this
neuron:
• x = 10
• Now, let's compute the output of the neuron using the sigmoid function:
• σ(10) ≈ 0.9999546
• At this point, the neuron has effectively saturated. Even small changes in w or x may
not significantly affect the neuron's output because the output is already close to 1.
• As a result:
• The gradient with respect to w (needed for weight updates during training) becomes
very small, causing slow learning or convergence issues.
• The neuron is not effectively contributing to the learning process since it responds
similarly to large variations in input.
• In practice, this phenomenon can lead to challenges in training deep neural networks,
especially when using activation functions like sigmoid or tanh. To mitigate unit
saturation, other activation functions such as ReLU (Rectified Linear Unit) or variants
like Leaky ReLU and Parametric ReLU are often used.
• These activation functions do not saturate as quickly for positive inputs and allow
gradients to flow more effectively during training, which can lead to faster
convergence and better learning.

Vygotsky's Socio-Cultural Theory
No ratings yet
Vygotsky's Socio-Cultural Theory
3 pages
Teaching Aptitude PDF
100% (1)
Teaching Aptitude PDF
52 pages
Oral Exam Rubric
No ratings yet
Oral Exam Rubric
1 page
Math Lesson Plan - More and Less
100% (1)
Math Lesson Plan - More and Less
7 pages
Reflection Edcom2
No ratings yet
Reflection Edcom2
3 pages
Sample WAP For School Heads
No ratings yet
Sample WAP For School Heads
8 pages
Observation-Sheet-3 JIM
No ratings yet
Observation-Sheet-3 JIM
7 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
English Version of DREEM
No ratings yet
English Version of DREEM
17 pages
Action Plan in Research 2021-2022
No ratings yet
Action Plan in Research 2021-2022
8 pages
Stand in My Shoes
0% (1)
Stand in My Shoes
4 pages
Ch06 Deep Feedforward Networks
100% (1)
Ch06 Deep Feedforward Networks
90 pages
Kohn's
No ratings yet
Kohn's
13 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
Lecture15 Neural Nets
No ratings yet
Lecture15 Neural Nets
70 pages
DL 2
No ratings yet
DL 2
62 pages
Research Report Module Guide - 2024
100% (1)
Research Report Module Guide - 2024
20 pages
DL Unit 1
No ratings yet
DL Unit 1
9 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
Quarter 1 Music-Of-Southeast-Asia
100% (1)
Quarter 1 Music-Of-Southeast-Asia
10 pages
Module 1
No ratings yet
Module 1
64 pages
DL Notes
No ratings yet
DL Notes
21 pages
Test Design for Educators
100% (1)
Test Design for Educators
53 pages
Syllabus
100% (1)
Syllabus
19 pages
Unit 2
No ratings yet
Unit 2
10 pages
For 5 Marks
No ratings yet
For 5 Marks
38 pages
Sir Alvin Self Efficacy
100% (1)
Sir Alvin Self Efficacy
23 pages
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
No ratings yet
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
27 pages
Tectonic Plates Lesson Reflection
100% (2)
Tectonic Plates Lesson Reflection
3 pages
6.1 DeepFFNets M2
No ratings yet
6.1 DeepFFNets M2
48 pages
Marketing Research Course Guide
No ratings yet
Marketing Research Course Guide
9 pages
Handwritten Notes - Unit 1,2
No ratings yet
Handwritten Notes - Unit 1,2
9 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Unit 4 ML NN, DL, CNN-1
No ratings yet
Unit 4 ML NN, DL, CNN-1
84 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Deep Learning Book Part1
No ratings yet
Deep Learning Book Part1
100 pages
Unit 4 Short Notes
No ratings yet
Unit 4 Short Notes
27 pages
History and Vision of CPSC
No ratings yet
History and Vision of CPSC
4 pages
PBL Learning Theory
No ratings yet
PBL Learning Theory
6 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
Two Proposed Perspectives On Mother Tongue-Based Education in The Philippines
No ratings yet
Two Proposed Perspectives On Mother Tongue-Based Education in The Philippines
12 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
AI Unit II Lec Notes Deep Learning
No ratings yet
AI Unit II Lec Notes Deep Learning
64 pages
Neural Network (Basics)
No ratings yet
Neural Network (Basics)
48 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
No ratings yet
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
14 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Unit 3
No ratings yet
Unit 3
7 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
Module 2
No ratings yet
Module 2
44 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
Intro to Feed Forward Neural Networks
No ratings yet
Intro to Feed Forward Neural Networks
41 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
Unit 1 DL
No ratings yet
Unit 1 DL
18 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Ijam Melija Hayati
No ratings yet
Ijam Melija Hayati
13 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
Rawat - Creativity and Education
No ratings yet
Rawat - Creativity and Education
12 pages
04 - Neural Networks PDF
No ratings yet
04 - Neural Networks PDF
46 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
ACR-Best For Home Learning Space
No ratings yet
ACR-Best For Home Learning Space
23 pages
Grade 11 Organization and Management: Our Lady of The Pillar College-San Manuel, Inc
No ratings yet
Grade 11 Organization and Management: Our Lady of The Pillar College-San Manuel, Inc
22 pages
Physical Activity As Punishment To Board 12 10
No ratings yet
Physical Activity As Punishment To Board 12 10
5 pages
The Mediocre Teacher Tells, The Good Teacher Explains, The Superior Teacher Demonstrates
No ratings yet
The Mediocre Teacher Tells, The Good Teacher Explains, The Superior Teacher Demonstrates
6 pages
SHAI - Task 3 - NN
No ratings yet
SHAI - Task 3 - NN
10 pages
Project 2: Needs Analysis Fieldwork and Report Questions
No ratings yet
Project 2: Needs Analysis Fieldwork and Report Questions
3 pages
VANERUM - Early Childhood Education
No ratings yet
VANERUM - Early Childhood Education
16 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Fort Zumwalt Cover Letter 2020
No ratings yet
Fort Zumwalt Cover Letter 2020
1 page
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages

UNIT 1 Introduction Part 1

Uploaded by

UNIT 1 Introduction Part 1

Uploaded by

UNIT- I

• This process is repeated for multiple training

You might also like