0% found this document useful (0 votes)

37 views13 pages

Module 2

Uploaded by

volleysilicon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views13 pages

Module 2

Uploaded by

volleysilicon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

⛱️

Module 2
1. Explain batch normalization with relevant example.

2. Write snippet code for transfer learning with Keras.

3. Explain with relevant illustration, in which cases would you

want to use each of the following activation functions: ELU,
leaky ReLU (and its variants), ReLU, tanh, logistic, and softmax?

Module 2 1
Detailed Answer

1. ELU (Exponential Linear Unit):

The Exponential Linear Unit (ELU) addresses the issue of dying

neurons in ReLU by allowing small negative outputs for negative
inputs. It helps improve learning speed and leads to smoother
convergence.

When to Use:

When training deep networks to achieve faster convergence.

In cases where small negative outputs are needed to push the

mean activation closer to zero.

Formula:

f(x) = { x
x if x > 0
α(e − 1) if x ≤ 0

where αis a positive constant.

Use Case:

Image classification and convolutional neural networks (CNNs)

where fast convergence is crucial.

Module 2 2
Works well with batch normalization.

2. Leaky ReLU (and Variants: Parametric ReLU, Randomized Leaky

ReLU):

Leaky ReLU addresses the dying ReLU problem by allowing a small,

non-zero slope for negative values of x. This keeps neurons active and
learning even when the input is negative.

Variants:

Parametric ReLU (PReLU): Allows the slope of negative values to be

learned.

Randomized Leaky ReLU (RReLU): Randomizes the slope during

training for regularization.

When to Use:

When dealing with deep networks prone to the dying ReLU

problem.

For time-series data or speech recognition tasks where some

negative values can hold important information.

f(x) = {
x if x > 0
αx if x ≤ 0

Where αis a small positive constant (e.g., α = 0.01).

3. ReLU (Rectified Linear Unit):

ReLU is the most commonly used activation function due to its

simplicity and effectiveness. It outputs 0 for negative inputs and the
input itself for positive inputs.

When to Use:

In hidden layers of deep neural networks.

Works well for image-related tasks and object detection.

Formula:

f(x) = max(0, x)

Limitations:

Module 2 3
Prone to the dying ReLU problem (neurons stop learning when
stuck in the negative region).

Use Case:

Convolutional Neural Networks (CNNs) for image recognition.

Feedforward networks where computational efficiency is crucial.

4. tanh (Hyperbolic Tangent):

The tanh activation function outputs values between -1 and 1, making it

zero-centered. This helps to avoid shifting gradients in one direction
during backpropagation.

When to Use:

When dealing with classification tasks that require negative values

as well as positive values.

In hidden layers of networks where zero-centered outputs are

beneficial.

Formula:

ex − e−x
f(x) = x
e + e−x

Use Case:

Recurrent Neural Networks (RNNs) for time-series predictions.

Binary classification tasks where outputs need to be zero-

centered.

5. Logistic (Sigmoid) Function:

The sigmoid function outputs values between 0 and 1, making it

suitable for binary classification problems.

When to Use:

For binary classification tasks where the output is a probability

(e.g., predicting whether an email is spam or not).

In output layers when a probability score is required.

Formula:

Module 2 4
1
f(x) =
1 + e−x

Limitations:

Causes vanishing gradients for very large or very small inputs.

The output is not zero-centered, which can slow down learning.

Use Case:

Logistic regression and binary classification tasks.

Probability-based models where outputs need to be interpreted as

probabilities.

6. Softmax Function:

The softmax function is used to convert the outputs of a neural network

into a probability distribution over multiple classes.

When to Use:

In the output layer of a multiclass classification model.

When the task requires predicting one class out of multiple

possible classes.

Formula:

ez i

σ(zi ) =

∑j=1 ez j

where zi is the output for a specific class, and K is the total number of

classes.

Use Case:

Multiclass classification problems, such as identifying handwritten

digits (0-9) in MNIST dataset.

Natural language processing (NLP) tasks like named entity

recognition.

4. Discuss how to Train the DNN on this training set. For each
image pair, you can simultaneously feed the first image to DNN
A and the second image to DNN B. The whole network will

Module 2 5
gradually learn to tell whether two images belong to the same
class or not.

5. Explain the vanishing gradients problem in neural network.

OR
Explain the vanishing and exploding gradients problem in
neural network.
Vanishing Gradients:

When training a neural network using backpropagation, the error gradients

computed during backpropagation tend to decrease as they are propagated
backward through the network's layers.

In deep networks, especially when using activation functions like the

sigmoid or hyperbolic tangent, this can cause the gradients to shrink
exponentially, making them too small to cause meaningful updates to the
weights in the earlier layers.

As a result:

The earlier layers learn very slowly, or not at all.

The model fails to converge to a good solution.

Exploding Gradients:

In contrast, the exploding gradients problem occurs when the gradients

become excessively large during backpropagation, resulting in large updates to
the network weights. This can cause:

The model parameters to become unstable.

Divergence in the training process, where the loss function increases

instead of decreasing.

These problems are more pronounced in very deep networks and recurrent
neural networks (RNNs).

Causes:
The vanishing/exploding gradients problem can be attributed to factors such
as:

Poor initialization of weights.

Module 2 6
Use of saturating activation functions like sigmoid and tanh.

The accumulation of small gradients through many layers.

Solutions:

Several techniques can mitigate these issues:

1. Use of Non-saturating Activation Functions: Functions like ReLU (Rectified

Linear Unit) do not saturate for positive inputs, reducing the risk of
vanishing gradients.

2. Proper Weight Initialization: Xavier and He initialization methods help

maintain a balance in gradient propagation.

3. Batch Normalization: Normalizing inputs within the network can reduce

internal covariate shift, stabilizing gradients during training.

4. Gradient Clipping: This technique caps the gradients during

backpropagation to avoid them from becoming too large, addressing
exploding gradients.

These methods have significantly improved the training of deep neural

networks, enabling them to handle more complex tasks effectively.

6. Discuss the problem that Glorot initialization and He

initialization aim to fix.
Both Glorot initialization and He initialization were proposed to address
vanishing and exploding gradients, especially in deep neural networks.

These problems occur because weights are poorly initialized, causing

gradients to shrink (vanish) or grow (explode) as they propagate backward
through the layers.

The vanishing and exploding gradients problem is particularly severe

when:

The network is deep (many layers).

Activations are not properly scaled, leading to either:

Vanishing gradients: When gradients become very small and fail to

update the weights of earlier layers.

Exploding gradients: When gradients grow too large and make the
network unstable.

Module 2 7
The root cause is how weights are initialized. If weights are too small or too
large, the signal passing through layers is either diminished or amplified
exponentially, causing instability.

Xavier Initialization:

Proposed by Xavier Glorot and Yoshua Bengio, Xavier initialization works

well for sigmoid and tanh activation functions, which are prone to
saturation (leading to vanishing gradients).

Balances the variance of the activations and gradients to keep them within
a reasonable range across layers.

Formula for weight initialization:

He Initialization:

Proposed by Kaiming He et al. for ReLU and variants of ReLU (like Leaky
ReLU).

ReLU does not saturate like sigmoid or tanh, but it can suffer from dying
neurons if weights are not initialized properly.

The scaling factor n2in accounts for the fact that ReLU only activates half of

the neurons on average, preventing the gradients from vanishing.

Formula for weight initialization:

2
W ∼ N (0, )

nin

Where:

nin = number of input neurons.

7. Differentiate Non-saturating and Saturating activation

functions with example.

Module 2 8
8. Explain the variants of ReLU activation function.
1. ReLU (Rectified Linear Unit):

The original ReLU is the most widely used activation function in deep
learning. It outputs 0 for negative inputs and x for positive inputs.

Formula:

f(x) = max(0, x)

Pros:

Simple and efficient.

Helps reduce vanishing gradient problems.

Computationally inexpensive.

Cons:

Dying ReLU problem: Neurons can stop learning if they get stuck in the
negative region and always output zero.

2. Leaky ReLU:

Module 2 9
Leaky ReLU fixes the dying ReLU problem by allowing a small, non-zero
slope for negative inputs.

Formula:

f(x) = {
x if x > 0
αx if x ≤ 0

Where αis a the hyperparameter (slope).

Pros:

Prevents neurons from dying.

Allows negative values to propagate through the network.

Cons:

Choosing the right value of αcan be tricky.

Use Case:

Used in GANs (Generative Adversarial Networks), RNNs, and

networks prone to the dying ReLU problem.

3. Parametric ReLU (PReLU):

Parametric ReLU is a variant of Leaky ReLU where the slope for negative
inputs is learnable during training.

Formula:

f(x) = {
x if x > 0
ax if x ≤ 0

Where ais a learnable parameter.

Pros:

The network can adapt the slope for negative values based on the data.

Reduces the risk of dying neurons.

Cons:

Can lead to overfitting if not regularized.

Use Case:

Module 2 10
Effective in large-scale image classification tasks and deep neural
networks.

4. Randomized Leaky ReLU (RReLU):

Randomized Leaky ReLU randomly chooses the slope αfor negative inputs
from a given range during training.

Formula:

f(x) = {
x if x > 0
αx if x ≤ 0

Where αis randomly sampled from a range [l, u]during training and fixed
during testing.

Pros:

Helps with regularization.

Reduces the risk of overfitting.

Cons:

The choice of the range [l, u]can impact performance.

Use Case:

Useful in low-resource environments or for noise-tolerant networks.

5. Exponential Linear Unit (ELU):

ELU allows small negative outputs, which helps push the mean activation
closer to zero, improving learning.

Formula:

f(x) = { x
x if x > 0
α(e − 1) if x ≤ 0

Where αis a constant.

Pros:

Zero-centered output, which improves learning speed.

Reduces vanishing gradients more effectively than ReLU.

Module 2 11
Cons:

Computationally expensive compared to ReLU.

Use Case:

Used in convolutional neural networks (CNNs) and deep residual

networks.

9. Discuss the different strategies to fix the vanishing gradient

issue.
The vanishing gradients problem poses a significant challenge in training deep
neural networks, hindering the effective learning of lower layers as gradients
diminish during backpropagation.
Here are several strategies to mitigate this issue:

1. Xavier and He Initialization:

These weight initialization techniques aim to maintain consistent variance

of both outputs and gradients throughout the network.

Xavier Initialization: Suitable for the logistic activation function, it initializes

weights randomly, ensuring the variance of outputs matches the variance
of inputs.

Formula for weight initialization:

He Initialization: Designed for the ReLU activation function and its variants,
it accounts for the fact that ReLU only activates for positive values.

It typically uses a normal distribution with a mean of 0 and standard

2
deviation of σ = ninputs
.

By employing these initialization methods, training can be accelerated

significantly, and deeper networks can be trained effectively.

2. Non-saturating Activation Functions:

Module 2 12
The choice of activation function plays a crucial role in mitigating vanishing
gradients.

The sigmoid activation function, once popular, suffers from saturation for
large input values, leading to gradients close to zero.

ReLU and its variants (Leaky ReLU, ELU, RReLU, PReLU) address this issue
by not saturating for positive values.

The ELU activation function, in particular, has shown promising results in

speeding up training and improving model performance, although it may be
computationally slower than ReLU at test time.

3. Batch Normalization:

This technique tackles the issue of internal covariate shift, where the
distribution of each layer's inputs changes during training.

It normalizes the inputs of each layer, stabilizing the gradients and allowing
the use of higher learning rates.

Batch Normalization has been shown to significantly reduce vanishing

gradients, speed up training, and improve the overall performance of deep
neural networks.

4. Gradient Clipping:

Primarily used for recurrent neural networks, gradient clipping involves

capping gradients during backpropagation to prevent them from exceeding
a certain threshold.

This helps prevent exploding gradients, a problem where gradients become

excessively large.

10. Discuss the various ways of how we can reuse a pre-

trained model.

11. Describe pretraining on an auxiliary task.

Module 2 13

ReLu Heuristics For Avoiding Local Bad Minima
100% (2)
ReLu Heuristics For Avoiding Local Bad Minima
10 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Deep Learning: Training Techniques
No ratings yet
Deep Learning: Training Techniques
42 pages
Deep Learning Challenges & Solutions
No ratings yet
Deep Learning Challenges & Solutions
64 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
(C) Amity University Online: Module-1: Introduction To Statistics
No ratings yet
(C) Amity University Online: Module-1: Introduction To Statistics
92 pages
CNN vs. RNN vs. ANN - Analysing 3 Types of Neural Networks in Deep Learning
No ratings yet
CNN vs. RNN vs. ANN - Analysing 3 Types of Neural Networks in Deep Learning
10 pages
Activation Function
No ratings yet
Activation Function
36 pages
Doe Implementing U.S. Department of Energy Lessons Learned Programs Doe-Hdbk-7502-95
No ratings yet
Doe Implementing U.S. Department of Energy Lessons Learned Programs Doe-Hdbk-7502-95
66 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
No ratings yet
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
23 pages
Deep Neural Network Training Guide
No ratings yet
Deep Neural Network Training Guide
55 pages
6.3 HiddenUnits
No ratings yet
6.3 HiddenUnits
26 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
43 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Workineh Tesema 2015 WSDAO Thesis
No ratings yet
Workineh Tesema 2015 WSDAO Thesis
99 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Professional Certification in Data Science For Business Decision Making
No ratings yet
Professional Certification in Data Science For Business Decision Making
16 pages
Lec 1 Data Mining Introduction For Exam
No ratings yet
Lec 1 Data Mining Introduction For Exam
48 pages
Deep Learning Activation Functions
No ratings yet
Deep Learning Activation Functions
10 pages
MCA Machine Learning Practical File
No ratings yet
MCA Machine Learning Practical File
22 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Deep Learning 15
No ratings yet
Deep Learning 15
13 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Week 5
No ratings yet
Week 5
49 pages
Weight Initialization Techniques Assignment Questions
No ratings yet
Weight Initialization Techniques Assignment Questions
8 pages
AI Unit 4
No ratings yet
AI Unit 4
30 pages
Random ( Statistical Stochastic) Vari-Able: V X Rain
No ratings yet
Random ( Statistical Stochastic) Vari-Able: V X Rain
8 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Activation
No ratings yet
Activation
7 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Genai See
No ratings yet
Genai See
51 pages
ICRTCCE 2004-Proceedings
No ratings yet
ICRTCCE 2004-Proceedings
147 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
BCS 465 Neural Network - 2020
No ratings yet
BCS 465 Neural Network - 2020
5 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Mcculloh: Linear Activation Function
No ratings yet
Mcculloh: Linear Activation Function
12 pages
Analysisand Modellingof Agricultural Landuseusing RSGIS
No ratings yet
Analysisand Modellingof Agricultural Landuseusing RSGIS
12 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Neural Network Activation Insights
No ratings yet
Neural Network Activation Insights
2 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
11 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
REPORT - STOCK PRICE PREDICTION - New
No ratings yet
REPORT - STOCK PRICE PREDICTION - New
40 pages
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
No ratings yet
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
20 pages
Neural Networks for Ordinal Regression
No ratings yet
Neural Networks for Ordinal Regression
6 pages
ANN Notes
No ratings yet
ANN Notes
7 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Predicting Startup Success with ML
No ratings yet
Predicting Startup Success with ML
94 pages
What Are The Activation Functions, How Do I Deter...
No ratings yet
What Are The Activation Functions, How Do I Deter...
3 pages
Final Project List
No ratings yet
Final Project List
15 pages
2012 - CAO - Sports Data Mining Technology Used in Basketball Outcome Predicti
No ratings yet
2012 - CAO - Sports Data Mining Technology Used in Basketball Outcome Predicti
106 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Neural Networks Activation Functions 1694135997
No ratings yet
Neural Networks Activation Functions 1694135997
7 pages
Machine Learning Marking Criteria Portfolio Part 3
No ratings yet
Machine Learning Marking Criteria Portfolio Part 3
1 page
Crime Type and Occurrence Prediction Using Machine Learning Algorithm
No ratings yet
Crime Type and Occurrence Prediction Using Machine Learning Algorithm
8 pages
Radar and Sonar Imaging and Processing
No ratings yet
Radar and Sonar Imaging and Processing
9 pages
ISE-1 Imp DLPDF
No ratings yet
ISE-1 Imp DLPDF
28 pages
ANN Viva Prep
No ratings yet
ANN Viva Prep
66 pages
Week 8
No ratings yet
Week 8
4 pages
A Survey On Educational Data Mining Techniques in Predicting Student's Academic Performance
No ratings yet
A Survey On Educational Data Mining Techniques in Predicting Student's Academic Performance
3 pages
AI Traffic & Ambulance Project Diary
No ratings yet
AI Traffic & Ambulance Project Diary
25 pages
Unit 5
No ratings yet
Unit 5
33 pages
Project Dairy 7th Sem
No ratings yet
Project Dairy 7th Sem
16 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
PAR-JUL-11722-535PM - 7088CEM Artificial Neural Networks
No ratings yet
PAR-JUL-11722-535PM - 7088CEM Artificial Neural Networks
6 pages
Unit 2
No ratings yet
Unit 2
35 pages
Individual Activity Report Format
No ratings yet
Individual Activity Report Format
6 pages
Module 2
No ratings yet
Module 2
12 pages
Unit 2 - Class - Preceptron
No ratings yet
Unit 2 - Class - Preceptron
13 pages
Front Page 1
No ratings yet
Front Page 1
2 pages
Activation Function
No ratings yet
Activation Function
34 pages
Document 8 Study
No ratings yet
Document 8 Study
8 pages
Course 2
No ratings yet
Course 2
1 page
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
72 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
3 pages
NNDL - Unit 1 - CBS
No ratings yet
NNDL - Unit 1 - CBS
11 pages
Al3451 ML - Questionbank - 3,4,5
No ratings yet
Al3451 ML - Questionbank - 3,4,5
11 pages
Activation Function
No ratings yet
Activation Function
6 pages
Deep Network Questions Answers Final
No ratings yet
Deep Network Questions Answers Final
3 pages
Module 2 Initialization and Optimization Technique
No ratings yet
Module 2 Initialization and Optimization Technique
6 pages
Deep FFNN QA Final Clean
No ratings yet
Deep FFNN QA Final Clean
4 pages
DL Module II Till7thAug
No ratings yet
DL Module II Till7thAug
131 pages
Data Science Interview Qes.
No ratings yet
Data Science Interview Qes.
15 pages
Activation Function
No ratings yet
Activation Function
10 pages
DL Answers
No ratings yet
DL Answers
11 pages
ADL Question Bank
No ratings yet
ADL Question Bank
23 pages
Arjun Yadav 32, Activation Function Assignment
No ratings yet
Arjun Yadav 32, Activation Function Assignment
7 pages

Module 2

Uploaded by

Module 2

Uploaded by

⛱️

2. Write snippet code for transfer learning with Keras.

3. Explain with relevant illustration, in which cases would you

1. ELU (Exponential Linear Unit):

The Exponential Linear Unit (ELU) addresses the issue of dying

When training deep networks to achieve faster convergence.

In cases where small negative outputs are needed to push the

where α﻿is a positive constant.

Image classification and convolutional neural networks (CNNs)

2. Leaky ReLU (and Variants: Parametric ReLU, Randomized Leaky

Leaky ReLU addresses the dying ReLU problem by allowing a small,

Parametric ReLU (PReLU): Allows the slope of negative values to be

Randomized Leaky ReLU (RReLU): Randomizes the slope during

When dealing with deep networks prone to the dying ReLU

For time-series data or speech recognition tasks where some

Where α﻿is a small positive constant (e.g., α = 0.01﻿).

ReLU is the most commonly used activation function due to its

In hidden layers of deep neural networks.

Works well for image-related tasks and object detection.

Convolutional Neural Networks (CNNs) for image recognition.

Feedforward networks where computational efficiency is crucial.

4. tanh (Hyperbolic Tangent):

The tanh activation function outputs values between -1 and 1, making it

When dealing with classification tasks that require negative values

In hidden layers of networks where zero-centered outputs are

Recurrent Neural Networks (RNNs) for time-series predictions.

Binary classification tasks where outputs need to be zero-

5. Logistic (Sigmoid) Function:

The sigmoid function outputs values between 0 and 1, making it

For binary classification tasks where the output is a probability

In output layers when a probability score is required.

Causes vanishing gradients for very large or very small inputs.

The output is not zero-centered, which can slow down learning.

Logistic regression and binary classification tasks.

Probability-based models where outputs need to be interpreted as

The softmax function is used to convert the outputs of a neural network

In the output layer of a multiclass classification model.

When the task requires predicting one class out of multiple

Multiclass classification problems, such as identifying handwritten

Natural language processing (NLP) tasks like named entity

5. Explain the vanishing gradients problem in neural network.

When training a neural network using backpropagation, the error gradients

In deep networks, especially when using activation functions like the

The earlier layers learn very slowly, or not at all.

The model fails to converge to a good solution.

In contrast, the exploding gradients problem occurs when the gradients

The model parameters to become unstable.

Divergence in the training process, where the loss function increases

Poor initialization of weights.

The accumulation of small gradients through many layers.

Several techniques can mitigate these issues:

1. Use of Non-saturating Activation Functions: Functions like ReLU (Rectified

2. Proper Weight Initialization: Xavier and He initialization methods help

3. Batch Normalization: Normalizing inputs within the network can reduce

4. Gradient Clipping: This technique caps the gradients during

These methods have significantly improved the training of deep neural

6. Discuss the problem that Glorot initialization and He

These problems occur because weights are poorly initialized, causing

The vanishing and exploding gradients problem is particularly severe

The network is deep (many layers).

Activations are not properly scaled, leading to either:

Vanishing gradients: When gradients become very small and fail to

Proposed by Xavier Glorot and Yoshua Bengio, Xavier initialization works

Formula for weight initialization:

the neurons on average, preventing the gradients from vanishing.

Formula for weight initialization:

nin ﻿= number of input neurons.

7. Differentiate Non-saturating and Saturating activation

Simple and efficient.

Helps reduce vanishing gradient problems.

Where α﻿is a the hyperparameter (slope).

Prevents neurons from dying.

Allows negative values to propagate through the network.

Choosing the right value of α﻿can be tricky.

Used in GANs (Generative Adversarial Networks), RNNs, and

3. Parametric ReLU (PReLU):

where αis a positive constant.

Where αis a small positive constant (e.g., α = 0.01).

nin = number of input neurons.

Where αis a the hyperparameter (slope).

Choosing the right value of αcan be tricky.

Where ais a learnable parameter.

The choice of the range [l, u]can impact performance.

Where αis a constant.