0% found this document useful (0 votes)

4 views12 pages

Module 2

The document provides an overview of feed forward neural networks, detailing their architecture, activation functions, and historical context. It covers gradient-based learning, cost functions, optimization techniques, and regularization methods to prevent overfitting. Additionally, it discusses advanced concepts such as batch normalization and weight initialization, along with practical implementation considerations for deep learning models.

Uploaded by

dhanushree.c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

Module 2

Uploaded by

dhanushree.c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

21CS743 | DEEP LEARNING

Module-02:
Feed forward Networks and Deep Learning

Introduction to Feed forward Neural Networks

1.1 Basic Concepts

 A feed forward neural network is the simplest form of an artificial neural network
(ANN).
 Information moves in only one direction: forward, from input nodes through hidden
nodes to output nodes.
 No cycles or loops exist in the network structure.
1.2 Historical Context
1. Origins
o Inspired by biological neural networks.
o First proposed by Warren McCulloch and Walter Pitts (1943).
o Significant advancement with the perceptron by Frank Rosenblatt (1958).
2. Evolution
o Transition from single-layer to multi-layer networks.
o Development of backpropagation in 1986.
o Modern deep learning revolution (2012–present).

Dept.,of AD Page 1
21CS743 | DEEP LEARNING

1.3 Network Architecture

1. Input Layer
 Receives raw input data
 No computation performed
 Number of neurons equals the number of input features
 Standardization/normalization often applied here
2. Hidden Layers
 Performs intermediate computations
 Can have multiple hidden layers
 Each neuron is connected to all neurons in the previous layer

3. Output Layer
 Produces the final network output
 Number of neurons depends on the problem type
 Classification: typically one neuron per class
 Regression: usually one neuron

Dept.,of AD Page 2
21CS743 | DEEP LEARNING

1.4 Activation Functions

1. Sigmoid (Logistic)
 Formula: σ(x) = 1 / (1 + e^(-x))
 Range: [0,1]
 Used in: Binary classification
 Properties:
o Smooth gradient
o Clear prediction probability
o Suffers from the vanishing gradient problem
2. Hyperbolic Tangent (tanh)
 Formula: tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
 Range: [-1,1]
 Often performs better than sigmoid
 Properties:
o Zero-centered
 Stronger gradients
 Still has vanishing gradient issue
3. ReLU (Rectified Linear Unit)
 Formula: f(x) = max(0, x)
 Most commonly used
 Helps solve the vanishing gradient problem
 Properties:
o Computationally efficient
o No saturation in the positive region
o Suffers from the dying ReLU problem
4. Leaky ReLU
 Formula: f(x) = max(0.01x, x)
 Addresses the dying ReLU problem
 Small negative slope
 Properties:

Dept.,of AD Page 3
21CS743 | DEEP LEARNING

o Never completely dies

o Allows for negative values
o More robust than standard ReLU
3. Gradient-BasedLearning
2.1 Understanding Gradients
1. Definition

 Gradient is a vector of partial derivatives

 Points in the direction of the steepest increase
 Used to minimize the loss function

2. Properties

 Direction indicates the fastest increase

 Magnitude indicates steepness
 Negative gradient is used for minimization

2.2 Cost Functions

1. Mean Squared Error (MSE)

o Used for regression problems
o Formula:

1. MSE = (1/n) * Σ (y_true - y_pred)^2

o
Properties:
 Always positive
 Penalizes larger errors more
 Differentiable
2. Cross-Entropy Loss
o Used for classification problems
o Formula: -Σ (y_true * log(y_pred))
o Properties:
 Measures probability distribution difference
 Better for classification than MSE

 Provides stronger gradients

3. Huber Loss
 Combines MSE and MAE
 Less sensitive to outliers
 Formula:

Dept.,of AD Page 4
21CS743 | DEEP LEARNING

o L = 0.5 * (y - f(x))^2 If ∣y−f(x)∣≤δ

o L = δ * |y - f(x)| - 0.5 * δ^2 otherwise

2.3 Gradient Descent Types

1. Batch Gradient Descent
o Uses the entire dataset for each update
o More stable but slower
o Formula: θ = θ - α * ∇J(θ)
o Memory intensive for large datasets
2. Stochastic Gradient Descent (SGD)
o Updates parameters after each sample
o Faster but less stable
o Better for large datasets
o High variance in parameter updates
3. Mini-batch Gradient Descent
o Compromise between batch and SGD
o Updates parameters after small batches
o Most commonly used in practice
o Typical batch sizes: 32, 64, 128
4. Advanced Optimizers
 Adam (Adaptive Moment Estimation)
 Combines momentum and RMSprop
 Adaptive learning rates
 Formula includes first and second moments
b) RMSprop
 Adaptive learning rates
 Divides by running average of gradient magnitudes
c) Momentum
 Adds fraction of previous update
 Helps escape local minima
 Reduces oscillation
3. Backpropagation and Chain Rule

Dept.,of AD Page 5
21CS743 | DEEP LEARNING

3.1 Chain Rule Fundamentals

1. Mathematical Basis
o df/dx = df/dy * dy/dx
o Allows computation of composite function derivatives
o Essential for neural network training
2. Application in Neural Networks
o Computes gradients layer by layer
o Propagates error backwards
o Updates weights based on contribution to error

3.2 Forward Pass

1. Input Processing

 Data normalization
 Weight initialization
 Bias addition

2. Layer Computation
Python
Copy
# Pseudo-code for forward pass
python
CopyEdit
for layer in network:
Z = W * A + b # Linear transformation
A = activation(Z) # Apply activation function

3. Output Generation
 Final layer activation
 Prediction computation
 Error calculation

Dept.,of AD Page 6
21CS743 | DEEP LEARNING

3.3 Backward Pass

1. Error Calculation
 Compare output with target
 Calculate loss using cost function
 Initialize gradient computation
2. Weight Updates
 Calculate gradients using chain rule
 Update weights:
 W_new = W_old - learning_rate * gradientUpdate biases similarly
3. Detailed Steps
Python
Copy
Pseudo-code for backward pass
Output layer:
python
CopyEdit
dZ = A - Y # For Mean Squared Error (MSE)
dW = (1/m) * dZ * A_prev.T
db = (1/m) * sum(dZ)
Hidden layers:
python
CopyEdit
dZ = dA * activation_derivative(Z)
dW = (1/m) * dZ * A_prev.T
db = (1/m) * sum(dZ)

4. Regularization for Deep Learning

4.1 L1 Regularization
Mathematical Form:
 Adds the absolute value of weights to loss

Dept.,of AD Page 7
21CS743 | DEEP LEARNING

4.1 L1 Regularization

1. Mathematical Form

 Adds absolute value of weights to loss

 Formula: L1=λ∑ |W|

2. Properties

 Feature selection capability

 Produces sparse models
 Less sensitive to outliers

4.2 L2 Regularization

1. Mathematical Form

 Adds squared weights to loss

 Formula: L2=λ∑ W2
 Prevents large weights

2. Properties

 Smooth weight decay

 No sparse solutions
 More stable training

4.3 Dropout

1. Basic Concept

 Randomly deactivate neurons

 Probability ppp of keeping neurons

 fferent network for each training batch

2. Implementation Details

Python
Copy

# Pseudo-code for dropout

python
CopyEdit
mask = np.random.binomial(1, p, size=layer_size)

Dept.,of AD Page 8
21CS743 | DEEP LEARNING

A = A * mask
A = A / p # Scale to maintain expected value

3. Training vs. Testing

 Used only during training

 Scaled appropriately during inference
 Acts as model ensemble

4.4 Early Stopping

1. Implementation

 Monitor validation error

 Save best model
 Stop when validation error increases

2. Benefits

 Prevents overfitting
 Reduces training time
 Automatic model selection

5. Advanced Concepts

5.1 Batch Normalization

1. Purpose

 Normalizes layer inputs

 Reduces internal covariate shift
 Speeds up training

2. Algorithm

Python
Copy

# Pseudo-code for batch normalization

python
CopyEdit
mean = np.mean(x, axis=0)
var = np.var(x, axis=0)
x_norm = (x - mean) / np.sqrt(var + ε)

Dept.,of AD Page 9
21CS743 | DEEP LEARNING

out = gamma * x_norm + beta

5.2 Weight Initialization

1. Xavier/Glorot Initialization

 Variance = 2 / (nin + nout)

 Suitable for tanh activation

2. He Initialization

 Variance = 2 / nin
 Better for ReLU activation

6. Practical Implementation

6.1 Network Design Considerations

1. Architecture Choices

 Number of layers
 Neurons per layer
 Activation functions

2. Hyperparameter Selection

 Learning rate
 Batch size

Regularization stre 1. Basic Concepts

 Explain the role of activation functions in neural networks

 Compare and contrast different types of gradient descent
 Describe the vanishing gradient problem

2. Mathematical Problems

 Calculate gradients for a simple 2-layer network

 Implement batch normalization equations
 Compute different loss functions

3. Implementation Challenges

Dept.,of AD Page 10
21CS743 | DEEP LEARNING

 Design a network for MNIST classification

 Implement dropout in Python
 Create a custom loss function

6.2 Training Process

1. Data Preparation

 Splitting data
 Normalization
 Augmentation

2. Training Loop

 Forward pass
 Loss computation
 Backward pass
 Parameter updates

Key Formulas Reference Sheet

1. Activation Functions

 Sigmoid: σ( ) = 1 / (1 + e⁻ˣ)
 Tanh: tanh( ) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)
 ReLU: f( ) = max(0, )

2. Loss Functions

 Mean Squared Error (MSE): (1/n) ∑( _true - _pred)²

 Cross-Entropy: -∑ ( _true × log( _pred))

3. Regularization

 L1 Regularization: L₁ = λ∑| |

3. Regularization

 L1 Regularization: L₁ = λ∑| |
 L2 Regularization: L₂ = λ∑ ²

4. Gradient Descent

 Update Rule: = - α∇J( )

 Momentum: = β - α∇J( )

Dept.,of AD Page 11
21CS743 | DEEP LEARNING

Common Issues and Solutions

1. Vanishing Gradients

 Use ReLU activation

 Implement batch normalization
 Try residual connections

2. Overfitting

 Add dropout
 Use regularization
 Implement early stopping

3. Poor Convergence

 Adjust learning rate

 Try different optimizers
 Check data normalization

Dept.,of AD Page 12

Deep Learning Module-02
No ratings yet
Deep Learning Module-02
15 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Complete Bundle Test Bank For Social Work and Family Violence 2nd US Edition by McClennen
No ratings yet
Complete Bundle Test Bank For Social Work and Family Violence 2nd US Edition by McClennen
408 pages
Module 1 DL
No ratings yet
Module 1 DL
84 pages
Activation Function
No ratings yet
Activation Function
6 pages
02 Neural Networks
No ratings yet
02 Neural Networks
32 pages
DL Module 2 1 (Sami)
No ratings yet
DL Module 2 1 (Sami)
17 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
4 Neural Networks
No ratings yet
4 Neural Networks
31 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
Mla Cat2
No ratings yet
Mla Cat2
8 pages
DL Answers
No ratings yet
DL Answers
11 pages
PDL Challenge 2
No ratings yet
PDL Challenge 2
9 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
No ratings yet
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
27 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Day 2 - Loss & Activation Functions
No ratings yet
Day 2 - Loss & Activation Functions
8 pages
Mcculloh: Linear Activation Function
No ratings yet
Mcculloh: Linear Activation Function
12 pages
ANN Notes
No ratings yet
ANN Notes
7 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Manual - Deep Learning Lab.
No ratings yet
Manual - Deep Learning Lab.
43 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
DeepLearning
No ratings yet
DeepLearning
32 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
DL Exp-3 16010422230
No ratings yet
DL Exp-3 16010422230
9 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
BAD601 Module 3 PDF
No ratings yet
BAD601 Module 3 PDF
70 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Slides 11
No ratings yet
Slides 11
48 pages
DL Lab Manual
No ratings yet
DL Lab Manual
52 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Deep Learning 15
No ratings yet
Deep Learning 15
13 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Module 2
No ratings yet
Module 2
13 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Alfred Adler's Individual Psychology - QUIZ
No ratings yet
Alfred Adler's Individual Psychology - QUIZ
26 pages
Cognitive Level Capability Verbs Instructional Prompts: Bloom's Taxonomy - Chart 2
No ratings yet
Cognitive Level Capability Verbs Instructional Prompts: Bloom's Taxonomy - Chart 2
2 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Module 1
No ratings yet
Module 1
16 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
U2 - Extra Practice
No ratings yet
U2 - Extra Practice
4 pages
2 - An Intelligent Retrievable Object-Tracking System With Real-Time
No ratings yet
2 - An Intelligent Retrievable Object-Tracking System With Real-Time
15 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
List of Affiliated B.Ed. College G-MAIL List Session 2014-154180
No ratings yet
List of Affiliated B.Ed. College G-MAIL List Session 2014-154180
20 pages
Hikmah Task 10 Presentation
No ratings yet
Hikmah Task 10 Presentation
12 pages
Summary and Note Taking With Key Revised Edition
0% (2)
Summary and Note Taking With Key Revised Edition
16 pages
BS Islamic Studies 4TH Semester English Notes
No ratings yet
BS Islamic Studies 4TH Semester English Notes
12 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Degree of Influence
No ratings yet
Degree of Influence
14 pages
EIM 2nd TOPIC-4GRADING
No ratings yet
EIM 2nd TOPIC-4GRADING
6 pages
Philosophy and Life's Meaning
No ratings yet
Philosophy and Life's Meaning
9 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Week 7 - Human Computer Interaction (HCI)
No ratings yet
Week 7 - Human Computer Interaction (HCI)
38 pages
Strategy: The Totality of Decisions - 47
No ratings yet
Strategy: The Totality of Decisions - 47
1 page
Revised Schedule: FIITJEE Computer Based All India Test Series For JEE Advanced 2020
No ratings yet
Revised Schedule: FIITJEE Computer Based All India Test Series For JEE Advanced 2020
1 page
Difference Between RDBMS and MongoDB
No ratings yet
Difference Between RDBMS and MongoDB
3 pages
Lightweight Edge Detection Network
No ratings yet
Lightweight Edge Detection Network
15 pages
PGDM Brochure & Flyers at Gibs Bangalore - Top PGDM College in Bangalore - Business Management Programme
No ratings yet
PGDM Brochure & Flyers at Gibs Bangalore - Top PGDM College in Bangalore - Business Management Programme
19 pages
First Quarter Module 1 Activities
100% (4)
First Quarter Module 1 Activities
2 pages
Final Synthesis Project
No ratings yet
Final Synthesis Project
8 pages
A Deep Learning Approach To The Geometry Friends Game (Artículo)
No ratings yet
A Deep Learning Approach To The Geometry Friends Game (Artículo)
10 pages
Assessing Environmental Perception
No ratings yet
Assessing Environmental Perception
8 pages
4th Grade Sugar Rhythms Lesson Plan
No ratings yet
4th Grade Sugar Rhythms Lesson Plan
3 pages
NURS 324 Athabasca
No ratings yet
NURS 324 Athabasca
5 pages
Intermediate Relational Database Certificate
No ratings yet
Intermediate Relational Database Certificate
1 page
SUT Degree College Calendar February 2024 - V1
No ratings yet
SUT Degree College Calendar February 2024 - V1
3 pages
Hermann Haken
No ratings yet
Hermann Haken
2 pages
Cross-Functional Team
No ratings yet
Cross-Functional Team
2 pages
Cultural and Social Studies Modules
No ratings yet
Cultural and Social Studies Modules
16 pages
Ba Sample Resume 2
No ratings yet
Ba Sample Resume 2
1 page
Lab Program File
No ratings yet
Lab Program File
1 page
Division Training ACR Day 1
No ratings yet
Division Training ACR Day 1
6 pages

Module 2

Uploaded by

Module 2

Uploaded by

21CS743 | DEEP LEARNING

Introduction to Feed forward Neural Networks

1.1 Basic Concepts

1.3 Network Architecture

1.4 Activation Functions

o Never completely dies

 Gradient is a vector of partial derivatives

 Direction indicates the fastest increase

2.2 Cost Functions

1. Mean Squared Error (MSE)

1. MSE = (1/n) * Σ (y_true - y_pred)^2

 Provides stronger gradients

o L = 0.5 * (y - f(x))^2 If ∣y−f(x)∣≤δ

2.3 Gradient Descent Types

3.1 Chain Rule Fundamentals

3.2 Forward Pass

3.3 Backward Pass

4. Regularization for Deep Learning

 Adds absolute value of weights to loss

 Feature selection capability

 Adds squared weights to loss

 Smooth weight decay

 Randomly deactivate neurons

 fferent network for each training batch

# Pseudo-code for dropout

3. Training vs. Testing

 Used only during training

4.4 Early Stopping

 Monitor validation error

5.1 Batch Normalization

 Normalizes layer inputs

# Pseudo-code for batch normalization

out = gamma * x_norm + beta

5.2 Weight Initialization

 Variance = 2 / (nin + nout)

6.1 Network Design Considerations

Regularization stre 1. Basic Concepts

 Explain the role of activation functions in neural networks

 Calculate gradients for a simple 2-layer network

 Design a network for MNIST classification

6.2 Training Process

Key Formulas Reference Sheet

 Mean Squared Error (MSE): (1/n) ∑( _true - _pred)²

 Update Rule: = - α∇J( )

Common Issues and Solutions

 Use ReLU activation

 Adjust learning rate

You might also like