Activation Function

Uploaded by

220701130

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views10 pages

Activation Function

Uploaded by

220701130

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Activation Function

An activation function in a neural network is a mathematical function applied to

each neuron’s output to determine whether it should be activated or not. It takes
the weighted sum of inputs plus bias, applies a transformation (often non-
linear), and passes the result to the next layer.
Uses of activation function
• Introduces Non-Linearity – Helps neural networks learn complex, non-
linear patterns.
• Enables Learning of Complex Patterns – Makes the network capable of
approximating any function.
• Transforms Signal Between Layers – Converts a neuron’s input into
output for the next layer.
• Controls Neuron Activation – Decides whether a neuron should be
active or not.
• Allows Learning of Arbitrary Mappings – Essential for handling tasks
like image, speech, and video processing.
• Supports Backpropagation – Facilitates gradient flow during training.
• Adds Flexibility to the Model – Allows networks to adapt to various
types of data and tasks.
• Improves Model Accuracy and Performance
Types of Activation Functions
The Binary Step activation function is one of the simplest activation functions
used in neural networks. It outputs only two possible values typically 0 or 1,
based on whether the input is less than or greater than a threshold

Gradient of the Binary Step Function:

 Derivative of f(x) with respect to x is 0 f’(x) = 0, for all x

 Gradients calculate the weights and biases and since the gradient of the
function is zero, the weights and biases don’t update.
 This function can be used as an activation function while creating a
binary classifier.
Limitation:
• This function will not be useful when there are multiple classes in the
target variable.
Activation Function: Sigmoid
The sigmoid activation function maps any real-valued input to a value between 0
and 1, making it useful for binary classification and as a squashing function in

neural networks.

Takes a real-valued number and “squashes” it into range between 0 and 1.

𝑅𝑛 → 0,1
Nice interpretation as the firing rate of a neuron
• 0 = not firing at all
• 1 = fully firing
Advantages:
• Smooth gradient
• Output can be treated as probability
• Good for binary classification
Disadvantages:

- Sigmoid neurons saturate and kill gradients, thus NN will barely learn
• when the neuron’s activation are 0 or 1 (saturate)
• gradient at these regions almost zero
• almost no signal will flow to its weights
• if initial weights are too large then most neurons would saturate
• Not zero-centered
• Slow convergence in deep networks
Activation Function: Tanh
The tanh (hyperbolic tangent) activation function is a nonlinear activation that
maps input values to the range (-1, 1). It is zero-centered, which helps
optimization converge faster compared to sigmoid.

Takes a real-valued number and

“squashes” it into range between -1 and 1.
𝑅𝑛 → −1,1

- Like sigmoid, tanh neurons saturate

- Unlike sigmoid, output is zero-centered
- Tanh is a scaled sigmoid: tanh 𝑥 = 2𝑠𝑖𝑔𝑚 2𝑥 −1
Advantages:
• Zero-centered output
• Stronger gradients than sigmoid
Disadvantages:
• Still suffers from vanishing gradients
• Saturates at large values
Activation Function: ReLU
ReLU is the most commonly used activation function in modern neural
networks. It outputs the input directly if it is positive; otherwise, it outputs zero.
 ReLU is the most commonly used activation function in CNNs and
ANNs.
 Its output range is from 0 to infinity [0,∞)
 It returns the input x if x>0 , otherwise returns 0.
 Though it appears linear in the positive region, ReLU is a non-linear
function overall.
 A combination of ReLU functions is also non-linear and can approximate
any function.
 ReLU is considered a good function approximator in neural networks.
 It performs six times faster than the hyperbolic tangent (tanh) function
during training.
 ReLU should only be used in the hidden layers of a neural network.
 For classification problems, the softmax function should be used in the
output layer.
 For regression problems, a linear activation function is preferred in the
output layer.
 ReLU has a drawback called the “dying ReLU” problem, where some
neurons become inactive and always output 0.
 This occurs due to weight updates that prevent the neuron from activating
on any input again.
Advantages:
 Trains much faster
1. accelerates the convergence of SGD
2. due to linear, non-saturating form
 Less expensive operations

1. compared to sigmoid/tanh (exponentials etc.)

2. implemented by simply thresholding a matrix at zero

 More expressive
 Sparse activation
 Works well in CNNs, MLPs, RNNs
 Prevents the gradient vanishing problem for +ive inputs

Disadvantages:
• Dead ReLU problem: neurons can die (always output 0 if stuck in
negative region)
• Not zero-centered
Activation Function: LeakyReLU
Leaky ReLU is a modified version of ReLU that allows a small negative slope
for inputs less than 0.It was designed to fix the "dying ReLU" problem where
neurons output 0 and stop learning.
• x > 0 → behaves like normal ReLU (returns x)
• x ≤ 0 → returns a small negative value instead of
Advantages:
• Fixes dead neuron issue in ReLU
• Allows small gradient when x < 0
Disadvantages:
• Slope value (0.01) is arbitrary and not learned
Activation Function: Swish
Swish is a smooth, non-monotonic activation function that often performs
better than ReLU in deep networks.
• For large positive x, Swish behaves like ReLU
• For negative x, Swish is smoothly negative, unlike ReLU
• It’s non-monotonic: meaning it curves up and down once, helping deeper
models learn better
Advantages:
• Smooth and non-monotonic
• Performs better than ReLU in many deep models
Disadvantages:
• Slightly slower than ReLU
GELU (Gaussian Error Linear Unit)
GELU is an advanced activation function that is widely used in transformers
and large language models like BERT and GPT. GELU blends ideas from ReLU
and probability theory. GELU weights inputs based on how likely they are to be
positive, assuming they come from a normal distribution.

Advantages:
• Used in BERT, GPT (transformers)
• Improves gradient flow
• Combines linear & nonlinear(Keeps small x and shrinks large negatives)
• Smooth, differentiable, and better than ReLU
• Probabilistic weighting
Disadvantages:
• More complex to compute
• Nonlinear, non-monotonic
ELU (Exponential Linear Unit)

SoftMax

ML PPT Activation Functions
No ratings yet
ML PPT Activation Functions
12 pages
Dl-Module 2
No ratings yet
Dl-Module 2
138 pages
DL Module 2
No ratings yet
DL Module 2
148 pages
Artificial Neural Networks (ANN)
No ratings yet
Artificial Neural Networks (ANN)
67 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
43 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
7 pages
L11 Introduction To Neural Network AI&ML CS877
No ratings yet
L11 Introduction To Neural Network AI&ML CS877
24 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
UNIT-3 Deep Learning (Revised) - 1
No ratings yet
UNIT-3 Deep Learning (Revised) - 1
92 pages
Module-4 Neural Network
No ratings yet
Module-4 Neural Network
61 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
Arjun Yadav 32, Activation Function Assignment
No ratings yet
Arjun Yadav 32, Activation Function Assignment
7 pages
Activation Function
No ratings yet
Activation Function
13 pages
Activation Function
No ratings yet
Activation Function
34 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
DL Lab 02
No ratings yet
DL Lab 02
12 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Common Activation Function
No ratings yet
Common Activation Function
13 pages
Deeplearning Shreiyans
No ratings yet
Deeplearning Shreiyans
18 pages
Types of Neural Network Activation Functions - How To Choose
No ratings yet
Types of Neural Network Activation Functions - How To Choose
36 pages
Activation Function
No ratings yet
Activation Function
18 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Act Fun
No ratings yet
Act Fun
7 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
CNN Activation Functions Explained
No ratings yet
CNN Activation Functions Explained
5 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Unit 2
No ratings yet
Unit 2
35 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Neural Network Activation Insights
No ratings yet
Neural Network Activation Insights
2 pages
Activation Function
No ratings yet
Activation Function
20 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
14 pages
Activation Function
No ratings yet
Activation Function
6 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Activation Functions
No ratings yet
Activation Functions
4 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
Module1
No ratings yet
Module1
124 pages
26 - Netinput Activation Function Forward and Back Propogation
No ratings yet
26 - Netinput Activation Function Forward and Back Propogation
41 pages
Uncertainty Budget Template
100% (1)
Uncertainty Budget Template
4 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
3 pages
Ai Project Cycle Class X
No ratings yet
Ai Project Cycle Class X
23 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Activation
No ratings yet
Activation
7 pages
Activation Function
No ratings yet
Activation Function
36 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Pr1 ANN Writeup
No ratings yet
Pr1 ANN Writeup
7 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Deep Learning Activation Functions
No ratings yet
Deep Learning Activation Functions
10 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Activation Function
No ratings yet
Activation Function
4 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
DAA PPT DAA PPT by Dr. Preeti Bailke
No ratings yet
DAA PPT DAA PPT by Dr. Preeti Bailke
21 pages
Short Review of Tony Hutchins' Book "Modern Financial Computation"
No ratings yet
Short Review of Tony Hutchins' Book "Modern Financial Computation"
1 page
Optimizing Initial Basic Feasible Solutions For Transportation Problems: A Novel Approach Incorporating Second Least Cost As Penalty
No ratings yet
Optimizing Initial Basic Feasible Solutions For Transportation Problems: A Novel Approach Incorporating Second Least Cost As Penalty
9 pages
Automation Chapter 4
No ratings yet
Automation Chapter 4
44 pages
The Traveling Salesman Problem and Its Variations
100% (1)
The Traveling Salesman Problem and Its Variations
836 pages
(Bajalinov) Linear-Fractional Programming 1st Edition
100% (5)
(Bajalinov) Linear-Fractional Programming 1st Edition
442 pages
Sigma-Delta Modulator FPGA Design
No ratings yet
Sigma-Delta Modulator FPGA Design
76 pages
Difference Between Laplace Transform and Fourier Transform
No ratings yet
Difference Between Laplace Transform and Fourier Transform
3 pages
FLOWCHART Lecture
No ratings yet
FLOWCHART Lecture
7 pages
Gauge R&R
No ratings yet
Gauge R&R
7 pages
Location Capacity Demand Allocation Telecom Optic
No ratings yet
Location Capacity Demand Allocation Telecom Optic
10 pages
Poster - Template - PPTX (1) (2) A Fe
No ratings yet
Poster - Template - PPTX (1) (2) A Fe
1 page
Trend Lines Case Study ONLINE
No ratings yet
Trend Lines Case Study ONLINE
25 pages
Inverse Manipulator Kinematics: Osman Parlaktuna Osmangazi University Eskisehir, Turkey WWW - Ogu.edu - TR/ Oparlak
No ratings yet
Inverse Manipulator Kinematics: Osman Parlaktuna Osmangazi University Eskisehir, Turkey WWW - Ogu.edu - TR/ Oparlak
41 pages
Lindu Software Presentation SEACG 2018 BALI
No ratings yet
Lindu Software Presentation SEACG 2018 BALI
24 pages
Data Structures Algorithms U5
No ratings yet
Data Structures Algorithms U5
83 pages
Conver Flat File Into Staing Area
No ratings yet
Conver Flat File Into Staing Area
1 page
Retail Shoe Dataset: Adidas vs. Nike: by - Rochita Sundar 15 April 2022
No ratings yet
Retail Shoe Dataset: Adidas vs. Nike: by - Rochita Sundar 15 April 2022
20 pages
Calculus C Differential Equations
No ratings yet
Calculus C Differential Equations
3 pages
Introduction To Spectral Theory
No ratings yet
Introduction To Spectral Theory
3 pages
Admm Homework
No ratings yet
Admm Homework
5 pages
System IDentification Programs
No ratings yet
System IDentification Programs
19 pages
MATHS Mini Project Sem4
No ratings yet
MATHS Mini Project Sem4
10 pages
Syllabus of BSCS Programme
No ratings yet
Syllabus of BSCS Programme
4 pages
B.Tech VIII SEM GPS Question Bank
No ratings yet
B.Tech VIII SEM GPS Question Bank
3 pages
C03 Machine Learning Overview
No ratings yet
C03 Machine Learning Overview
13 pages
Present Value of A Single Amount
No ratings yet
Present Value of A Single Amount
21 pages
8-Big-M Method - II-11-01-2025
No ratings yet
8-Big-M Method - II-11-01-2025
20 pages

Activation Function

Uploaded by

Activation Function

Uploaded by

Activation Function

An activation function in a neural network is a mathematical function applied to

Gradient of the Binary Step Function:

 Derivative of f(x) with respect to x is 0 f’(x) = 0, for all x

Takes a real-valued number and “squashes” it into range between 0 and 1.

Takes a real-valued number and

- Like sigmoid, tanh neurons saturate

1. compared to sigmoid/tanh (exponentials etc.)

2. implemented by simply thresholding a matrix at zero

You might also like