Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views43 pages

CS2011 5

The document provides an introduction to neural networks, explaining their structure and functionality through mathematical models and the concept of artificial neurons. It covers the training process of single and multilayer neural networks, including error calculation and optimization techniques such as back-propagation. Additionally, it discusses activation functions, weight initialization, and the significance of deep learning in modern AI applications.

Uploaded by

rr1151818
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views43 pages

CS2011 5

The document provides an introduction to neural networks, explaining their structure and functionality through mathematical models and the concept of artificial neurons. It covers the training process of single and multilayer neural networks, including error calculation and optimization techniques such as back-propagation. Additionally, it discusses activation functions, weight initialization, and the significance of deep learning in modern AI applications.

Uploaded by

rr1151818
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Introduction to Neural Network

Lecture 10-11: Data Science

Outlines
• Introduction to Neural Network
• Mathematical Model for Neural Network
• Differentiation and its Application to Train Neural Network

11/8/2024 1
Artificial Neural Network
• An Artificial Neural Network (ANN) is a
mathematical model that loosely simulates the
structure and functionality of Biological nervous
system to map the inputs to outputs.
Block Diagram of Biological Nervous System

Neural
Stimulus Receptors
Network
Effectors Response
Or
Brain
Typical Human Brain

Cell Body
Human Brain Neuron vs Artificial Neuron
Artificial Neuron

x1 bk
Wk1
x2 Wk2
Wk3 Vk
x3
:

: Wkn
: Kth
xn Neuron

𝑽𝒌 = 𝑾𝒌𝟏 ∗ 𝒙𝟏 + 𝑾𝒌𝟐 ∗ 𝒙𝟐 + 𝑾𝒌𝟑 ∗ 𝒙𝟑 + ⋯ + 𝑾𝒌𝒏 ∗ 𝒙𝒏 + 𝒃𝒌


Artificial Neuron

x1 bk
Wk1
x2 Wk2
Wk3 Vk Activation
x3
:
∑ Function yk
: Wkn
: Kth
xn Neuron
𝒏

𝑽𝒌 = (𝑾𝒌𝒋 ∗ 𝒙𝒋) + 𝒃𝒌
𝒋=𝟏
𝒚𝒌 = 𝒇(𝑽𝒌)
Single Neuron Model

bk

Output is Vk
Wk1
Linearly
Dependent
x1 ∑ yk
on Input
Parameters
𝑽𝒌 = 𝑾𝒌𝟏 ∗ 𝒙𝟏 + 𝒃𝒌

𝒚𝒌 = 𝒇(𝑽𝒌) = 𝑾𝒌𝟏 ∗ 𝒙𝟏 + 𝒃𝒌
Single Neuron Model
• Application
– For data Fitting applications where we have to fit a
ystraight
=mx+cline to a large data set.
Where m=Slope Of Straight Line
X=Height c=Intercept y=Weight

W 80 v
E c
I 60 v
G
H 40
T
20

0
1 2 3Height 4 5 6
Single Neuron Model
Error Calculation
• The error Ei=(Actual Value – Predicted value)=(𝑇𝑖 − 𝑦𝑖)
• For making +ve= 𝐸𝑖 = (𝑇𝑖 − 𝑦𝑖)2 [Error for ith input instance]

W 80 v
E c
I 60 v
G
H 40
T
20

0
1 2 3Height 4 5 6
Linear Neural Network
• Error Calculation
– It is done to adjust the slope(m) and intercept for better
fitting next time.

80
W
E 60
I
G 40
H
T 20
0
1 2 3 4 5 6
Height
Linear Neural Network

yk= Wk1*x1 + bk
y=m*x+c
bk

Output is Vk
Wk1
Linearly
Dependent on
x1 ∑ yk
Input
Parameters
Vk= Wk1*x1 + bk

yk= f(Vk)= Wk1*x1 + bk


Plotting Error

Error

Wk1 
Differentiation…

𝑦 = 𝑓(𝑥)
𝑦 = 𝑓(𝑥)
y2 (x2,y2)

𝑑𝑦 𝑑𝑓 (x1,y1) 𝜃
= = 𝑦′ = 𝑓 ′ y1
𝑑𝑥 𝑑𝑥

𝑥 ------ x1 x2
Δ𝑦 𝑦2 − 𝑦1 𝑝
How much does y change as x changes= = = = tan⁡
(𝜃)
Δ𝑥 𝑥2 − 𝑥1 𝑏

𝑑𝑦 Δ𝑦
= lim
𝑑𝑥 Δ𝑥→0 Δ𝑥
Differentiation…

𝑦 = 𝑓(𝑥)
𝑑𝑦 Δ𝑦
= lim
𝑑𝑥 Δ𝑥→0 Δ𝑥
As Δ𝑥 → 0 we obtain a y2 y1
tangent at x.
𝜃
𝑑𝑦 𝑥 x1 x2
= tan(𝜃)=slope of the tangent at x=x1
𝑑𝑥
𝑑𝑦
= Slope of the tangent to x-axis at x=x1
𝑑𝑥
Differentiation…

𝑦 = 𝑓(𝑥)
0 < 𝜃 < 90 tan(𝜃)= +ve
90 tan(90)=Undefined

𝜃
𝑥 x1
Differentiation…

𝑦 = 𝑓(𝑥)
𝜃 > 90 tan(𝜃)= - ve

𝜃
𝑥 x1
Differentiation…

𝑦 = 𝑓(𝑥)
𝜃 = 0 tan(𝜃)= 0

𝑥 x1
Differentiation…

𝑦 = 𝑓(𝑥)
𝜃 = 0 tan(𝜃)= 0

Maxima

𝑥 x1

Minima

Note: At minima and Maxima the


𝑑𝑦
Slope is 0  tan(𝜃)=0  = 0
𝑑𝑥
Differentiation…
Distinguishing between a Minima & Maxima
Let f(x)= X2 - 3X + 2
𝑑𝑓
=0
𝑑𝑥
2X -3 =0
X=1.5
f(1.5)=-0.25

Take a point near 1.5, let X=1


f(1)=1-3+2=0

X=1.5 can’t be maxima. It is a minima.


Error Function with Minima and No Maxima

𝑦 = 𝑓(𝑥)
𝑥

Minima

12
Error Function with a Maxima and No Minima

𝑦 = 𝑓(𝑥)
Maxima

𝑥

12
Error Function without a Maxima and Minima

𝑦 = 𝑓(𝑥)
𝑥

12
Error Function with multiple Maxima and Minima

𝑦 = 𝑓(𝑥)
𝑥

Global
Minima
Local
Minima

12
TRAINING A SINGLE-NEURON MODEL
xi1 bk
Wk1
xi2 Wk2
Wk3 Vk Activation
xi3
:
∑ Function y’k
: Wkd
: Kth
xid Neuron

Vk= ∑j=1 d (W
kj*xij) + bk y’k= f(Vk) L= ∑i=1n (yi - f(wTxi +b)2

• Step-1: Define the loss function

• Step-2: Define the optimization


13
TRAINING A SINGLE-NEURON MODEL
bk y’k= f(Vk)
xi1
W1 Vk= ∑j=1d (Wj*xij) + bk
xi2 W2 L=(y-y’)2
W3 Vk Activation
xi3
:
∑ Function y’k
: Wn
:
xin
• Step-3: Solve the optimization problem
– Randomly initialize the weights -2(y-y’))

– Feed forward the inputs and compute the loss function


– Update the weights -2(y-y’)*x x

27
TYPES OF NEURAL NETWORK

28
WHY MULTILAYER NEURAL NETWORK?
• Biological Inspiration
• Universal Approximators: Can approximate any nonlinear
function to any desired level of accuracy.
• Results in Powerful Models

29
TRAINING MULTILAYER NEURAL NETWORK

Randomly Forward it Back-


Sample Update the
through the
labeled data Initialize the propagate the network
network, get
Weights errors weights
predictions

• Back-Propagation: Chain Rule + Memoization


– In Stochastic Gradient Descent (SGD) U take one point (Input Vector)
– In Mini-Batch SGD, U take a set of points(input vectors)
– In Gradient Descent, U take all the input vectors

30
AI vs Machine Learning vs Deep Learning
Deep Learning
• A type of machine learning based on artificial
neural networks in which multiple layers of
processing are used to extract progressively higher
level features from data.

- ―Deep Learning with Python‖ Francois Chollet

32
DEEP LEARNING APPROACH
• Standard Approach (Mathematicians)
– Build new theories
– Perform Experiments
• New Deep Learning (Engineers way)
– Given huge amount of computational power
– People First Experiment and then try to build a theory

33
Why Deep Learning ? Why Now ?
• Computer Vision- Convolutional Neural Networks
and Backpropagation —well understood since 1989

• Time Series Forecasting- Long Short-Term


Memory — well understood since 1997

- ―Deep Learning with Python‖ Francois Chollet

34
Why Deep Learning ? Why Now ?

35
Algorithmic Advancements…
• Better Activation Functions for neural layers.

• Better Weight Initialization Schemes starting with


layer-wise pretraining.

• To avoid Overfitting the Concepts like Dropout is


Introduced.

• Better optimization schemes, such as RMSProp and


Adam.

36
Activation Functions…
• An Activation Function (Transfer Function) maps the
weighted summation of inputs to output.
• An Activation function is used to add Nonlinearity so
that the network can learn complex patterns.

37
Sigmoid Activation Functions
• Characteristics:
– Differentiable
– Nonlinear
– O/P lies in [0-1]
– Fast
– Vanishing Gradient
Problem

38
VANISHING GRADIENT PROBLEM
• Because of sigmoid activation function the derivative is
less than 1 and when the derivatives are multiplied it
gives a very small number which ultimately changes the
weight very less.
• Usually occurs when the derivative is less than 1.
• In case of sigmoid and tanh activation function it occurs
frequently.

𝑑𝐿 𝑑𝐿 𝑑𝑓1 𝑑𝑓2 𝑑𝑓𝑛


= × × × ⋯……………×
𝑑𝑤 𝑑𝑓1 𝑑𝑓2 𝑑𝑓3 𝑑𝑤

39
ReLU Activation Function
• f(x)= x, when x>0
= 0, when x<=0
• Avoids Vanishing Gradient Problem.
• Derivative is Simple
– f’(x)= 1 for x>=0
= 0 for x<0
• Problem:
– Dead ReLU Units

40
Leaky ReLU Activation Function
• f(x)= x, when x>0
= 0.1x, when x<=0
• The advantages of Leaky ReLU are same as that
of ReLU.
• In addition, it enables Backpropagation, even for
negative input values.
• Avoids Dead ReLU
• Simple Derivative
– f’(x)= 1 for x>=0
= 0.1 for x<0

41
WEIGHT INITIALIZATION

Error----
Weights----

42
WEIGHT INITIALIZATION
• Mostly used
– We should never initialize to same values.
• Asymmetry is necessary
– We should not initialize to large –ve values
• Vanishing Gradient problems
– Weights should be small (not too small)
– Weights should have good variance
– Weights should come from a Normal distribution with
mean zero and small variance
– Should have some +ve and Some –ve values

43

You might also like