0% found this document useful (0 votes)

21 views54 pages

Module 2 Notes - Full

The document provides an overview of Multi-layer Perceptrons (MLPs) and their components, including the forward and backward propagation processes, activation functions, and error minimization techniques. It discusses the implementation of MLPs, the XOR problem, and the importance of training data and hidden layers in achieving effective learning. Additionally, it introduces Radial Basis Functions (RBFs), the curse of dimensionality, and the significance of dimensionality reduction in machine learning.

Uploaded by

6cm872mffs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views54 pages

Module 2 Notes - Full

Uploaded by

6cm872mffs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Module 2

Syllabus
Multi-layer Perceptron– Going Forwards – Going Backwards: Back Propagation
Error – Multi-layer Perceptron in Practice – Examples of using the MLP –
Overview – Deriving Back-Propagation – Radial Basis Functions and Splines –
Concepts – RBF Network – Curse of Dimensionality – Interpolations and Basis
Functions – Support Vector Machines
Perceptron
It is an artificial neural network
It is the simplest possible neural network
It is a binary classifier with three main components:
1. input nodes/input layer
2. Weights and bias
3. Activation function
Types of Activation Functions

Step function
Multilayer Perceptron
•Single-layer networks can only create linear decision boundaries (planes).
•Multi-layer networks can create complex decision boundaries by transforming the input space
through non-linear activation functions in hidden layers.
•This allows the network to solve non-linearly separable problems by finding linear separations in
higher-dimensional spaces created by the hidden layers.
•Hidden layers consist of neurons that apply non-linear
transformations to the input data
XOR problem
Going Forward
Forward Pass (Recall)
•Purpose: To compute the predicted output of the network given the input data.
•Steps:
• Input Layer: The input data is fed into the network.
• Hidden Layers: Each neuron's output is calculated using the weighted sum of inputs plus a bias,
followed by an activation function.
• Output Layer: The final output is computed in the same manner, providing the network's prediction.

•Output: The network's prediction for the given input data.

Implementation of XOR
•Network Structure:
•Input layer: Nodes A and B
•Hidden layer: Nodes C and D
•Output layer: Node E
•For input (1, 0): a.
a. Hidden Layer:
•Node C: Input = -1×0.5 + 1×1 + 0×1 = 1.5 Result: C fires (output 1) as 0.5 > 0 (threshold)
•Node D: Input = -1×1 + 1×1 + 0×1 = 0 Result: D doesn't fire (output 0) as 0 ≤ 0 (threshold)
b. Output Layer:
•Node E: Input = -1×0.5 + 1×1 + 0×-1 = 0.5 Result: E fires (output 1) as 0.5 > 0 (threshold)
•XOR Function:
•E fires when A and B are different (1,0 or 0,1)
•E doesn't fire when A and B are the same (0,0 or 1,1)
GOING BACKWARDS: BACK-PROPAGATION OF ERROR
Backward Pass (Weight Update)

•Purpose: To update the network's weights and biases to minimize the error between the predicted output and the actual target.

•Steps:
• Calculate Error: Determine the difference between the predicted output and the actual target.
• Compute Gradients: Calculate the gradient of the error with respect to each weight and bias in the network using the
chain rule. This involves:
• Output Layer: Compute the gradient of the error with respect to the output layer's inputs (derivative of the loss
function with respect to the network's output).
• Hidden Layers: Propagate the error back through the network, computing the gradient of the error with respect to the
inputs of each hidden layer neuron.
• Update Weights and Biases: Adjust each weight and bias by a small amount proportional to the negative of its
gradient (using the learning rate to control the size of the update).

Output: Updated weights and biases for the network, aimed at reducing the prediction error.
1. Error minimization in multi-layer perceptrons (MLPs) is more complex than in simple perceptrons
due to multiple layers of weights.

2. Determining which weights caused the error (credit assignment problem) is challenging in MLPs.

3. The simple error function used for perceptrons (Σ(yk - tk)) is inadequate for MLPs as positive and
negative errors can cancel out.

4. A sum-of-squares error function is introduced

5. This new error function ensures all errors contribute positively to the total error.

6. The (1/2) factor in the error function simplifies differentiation.

•If we differentiate a function, then it tells us the gradient of that function, which is the direction along which
it increases and decreases the most. So if we differentiate an error function, we get the gradient of the error.

•The weights of the network are trained so that the error goes downhill until it reaches a local minimum, just
like a ball rolling under gravity
The Multi-layer Perceptron Algorithm
Introduction

The inputs are fed forward through the network the error is computed as the sum-of-squares difference between
the network outputs and the targets

The error is fed backwards through the network in order to

L M N
• first update the second-layer weights

• and then afterwards, the first-layer weights

Initialisation

– initialise all weights to small random values

• Training – repeat:

∗ for each input vector:

Forwards phase:
•compute the activation of each neuron j in the hidden layer(s) using:

•It goes to the output neurons with activation function

Backward Phase

(yk - tk) is the error term

yk(1-yk) is the derivative of the activation
function
Multilayer perceptron in practise
we are going to look more at choices that can be made about the network in order to use it for solving real problems.

i) Amount of training data

• (MLPs) with one hidden layer contain a substantial number of adjustable parameters, specifically (L + 1) × M + (M + 1) ×
N weights, where L, M, and N represent the number of nodes in the input, hidden, and output layers, respectively.

• Training these networks involves setting these numerous weights through the back-propagation algorithm, which relies on
errors derived from training data. While more training data generally improves learning outcomes, it also increases
training time.

• There is no precise formula to determine the minimum required amount of data, as it varies depending on the problem.
ii) Number of hidden layers

First Hidden Layer: This layer typically combines sigmoid functions to create ridge-like or hill-shaped functions (as
shown in Figure (a) and (b)).The outputs of individual neurons in this layer are not yet "bumps" but rather sigmoid-
shaped curves or hills.

Second Hidden Layer (if present):This layer combines the outputs from the first hidden layer. It's at this stage that true
"bump" functions can be formed Figure (c)).The combination of hills from the first layer, when oriented properly (e.g., at
90° to each other), creates localized bump responses.

Output Layer: This is where the final addition of bumps typically occurs. The outputs from the previous layer (either the
first or second hidden layer) are combined linearly to approximate the desired function. If using two hidden layers, this
layer combines the bump functions to create the final output.
The effective learning at each layer
When to stop learning?
Training Process: The MLP is trained over multiple epochs (iterations over
the entire dataset). Weights are adjusted as the network makes errors in each
iteration.

Stopping Criteria: Simple methods like setting a fixed number of iterations

or a minimum error threshold are not sufficient. These can lead to overfitting
or underfitting.

Validation Set: A separate dataset used to monitor the network's

generalization ability during training.

Error Curves: Training error: Typically decreases rapidly at first, then slows
down.

Validation error: Initially decreases but may start increasing at some point.

Early Stopping: The technique of stopping training when the validation error
starts to increase.
Examples of MLP
Given Information:
•Input layer: x1 = 0.35, x2 = 0.7
•Hidden layer: h1, h2
•Output layer: o3
•Weights: w11 = 0.2, w21 = 0.2, w12 = 0.3, w22
= 0.3, w13 = 0.3, w23 = 0.9
•Activation function: Sigmoid
•Actual output (y) = 0.5
Steps:
1.Calculate inputs to hidden layer neurons: For h1: net_h1 = x1 * w11 + x2 * w21 = 0.35 * 0.2 + 0.7 *
0.2 = 0.07 + 0.14 = 0.21 For h2: net_h2 = x1 * w12 + x2 * w22 = 0.35 * 0.3 + 0.7 * 0.3 = 0.105 +
0.21 = 0.315
2.Apply sigmoid activation function to hidden layer: sigmoid(x) = 1 / (1 + e^-x) h1 = sigmoid(0.21) = 1
/ (1 + e^-0.21) ≈ 0.5523 h2 = sigmoid(0.315) = 1 / (1 + e^-0.315) ≈ 0.5781
3.Calculate input to output neuron: net_o3 = h1 * w13 + h2 * w23 = 0.5523 * 0.3 + 0.5781 * 0.9 =
0.1657 + 0.5203 = 0.686
4.Apply sigmoid activation to output neuron: o3 = sigmoid(0.686) = 1 / (1 + e^-0.686) ≈ 0.6651
5.Calculate error: Error = (y - o3)^2 / 2 = (0.5 - 0.6651)^2 / 2 ≈ 0.0136
Forward propagation output: 0.6651
Error: 0.0136
Deriving Back Propagation
Prerequistes:
1. d/dx(1/2x2)= x,
2. chain rule, dy/dx = dy/dt .dt/dx.
3. dy /dx = 0 if y is not a function of x.
The output of the neural network (the end of the forward phase of the algorithm) is a function of three
things:
• the current input (x)
• the activation function g(·) of the nodes of the network
• the weights of the network (v for the first layer and w for the second)
1. The Error of the network
The error function (for example, Mean Squared Error) for the neural network is defined as:

where:
• yk is the output of the network for the k-th training example.
• tk is the target value for the k-th training example.
•N is the number of training examples.
Taking the Partial Derivative
To update the weights using gradient descent, we need to compute the gradient of the error function with
respect to the weights
Start with the error function:
We are going to use a gradient descent algorithm that adjusts each weight wικ for fixed
values of ι and κ, in the direction of the negative gradient of E(w)

Taking partial derivative

the weight update rule is that we follow the gradient downhill, that is, in the direction
Requirement of an activation function
To effectively model a neuron in a neural network, an activation function should have the
following properties:
1.Differentiable:
Must be differentiable to compute gradients during backpropagation.

2.Saturation:
Should saturate at both ends of its range, allowing the neuron to either fire or not.

3.Rapid Transition:
Should change quickly in the middle of its range for sensitivity to input changes.
Derivation of activation function
Sigmoid function is defined as
Back Propagation Error
Chain rule,
output of output layer neuron κ is

Error term or Delta term,

The expression for the error at the output,

Chain rule,
Weight updation,

Hidden node contributes to the activation of all of

the output nodes, and so we need to consider all of
these contributions (with the relevant weights).
Now the weight update rule for hidden layer,
vl,
Radial Basis Function
Receptive fields: A receptive field refers to the
specific region of the input space that a neuron or
node responds to

Neuron Firing Behavior: If the input x is near to the

center of a neuron's receptive field, that neuron will
fire more strongly. The "center" here refers to the
point in input space where the neuron is most
sensitive.

A Radial Basis Function is a real-valued function

whose value depends only on the distance from a fixed
point, called the center.
Gaussian Function Behavior:In the RBF network, each neuron's response is modeled by a
Gaussian function. The center of this Gaussian corresponds to the neuron's optimal input.
φ(x)=

This graph illustrates a Gaussian Radial Basis Function:

1.The x-axis represents the input space.
2.The y-axis represents the output of the RBF, φ(x).
3.The red dashed line indicates the center (c) of the function.
4.The blue curve shows how the function's output decreases
symmetrically as the distance from the center increases.
THE RADIAL BASIS FUNCTION (RBF)
NETWORK

. Basis Functions (RBFs) are a special category

Radial
of feed-forward neural networks comprising three
layers:
 Input Layer: Receives input data and passes it
to the hidden layer.
 Hidden Layer: The core computational layer
where RBF neurons process the data.
 Output Layer: Produces the network’s
predictions, suitable for classification or
regression tasks.
RBF Working
 Input Vector: The network receives an n-dimensional input vector that needs classification or
regression.
 RBF Neurons: Each neuron in the hidden layer represents a prototype vector from the training set. The
network computes the Euclidean distance between the input vector and each neuron’s center.
 Activation Function: The Euclidean distance is transformed using a Radial Basis Function (typically a
Gaussian function) to compute the neuron’s activation value. This value decreases exponentially as the
distance increases.
 Output Nodes: Each output node calculates a score based on a weighted sum of the activation values
from all RBF neurons. For classification, the category with the highest score is chosen.
The Radial Basic Function Algorithm
Step 1: Selecting the Centers : Techniques for Centre Selection: Centre’s can be picked at random from the
training set of data or by applying techniques such as k-means clustering.
K-Means Clustering: The center’s of these clusters are employed as the center’s for the RBF neurons in
this widely used center selection technique, which groups the input data into k groups.

Step 2: Calculate the actions of the RBF nodes using

Step 3: Train the output weights by either:
– using the Perceptron OR
– computing the pseudo-inverse of the activations of the RBF centres
Pseudo inverse calculation for weights
Activation Matrix G:
•For each input vector, the activations of all hidden nodes are computed.
•These activations are assembled into a matrix G.
•Outputs of the network can then be computed as y=GW, W is the weight
•If all the outputs are correct, t=GW
•W= t
matrix inverse is only defined if a matrix is square, If its non square we can go for pseudo inverse,
•Pseudo inverse is defined as,
Curse of Dimensionality
•The curse of dimensionality highlights the challenges of working with high-dimensional data. As the
number of dimensions increases, the volume of the unit hypersphere tends to zero, making it
harder to analyze and interpret the data effectively.

•This phenomenon underscores the importance of dimensionality reduction techniques and careful
feature selection in machine learning and data analysis.
Unit hydrosphere
The unit hypersphere is a generalization of circles and spheres to higher dimensions. In a d-dimensional
space, a unit hypersphere is the set of all points that are exactly one unit distance away from a central point
(usually the origin).

•In 2 Dimensions (2D): A unit hypersphere is a circle with radius 1 centered at the origin (0, 0).

•In 3 Dimensions (3D): A unit hypersphere is a sphere with radius 1 centered at the origin (0, 0, 0).
How higher dimension affect in ML?
•The curse of dimensionality affects our machine learning algorithms because as the number of
input dimensions increases, we need more data to help the algorithm generalize well.

•Since our algorithms classify data based on features, more features mean we need more data
points. Therefore, we must be selective about the information we provide to the algorithm,
which requires some prior understanding of the data. So we perform dimensionality reduction
techniques.
Dimensionality Reduction Techniques:
 Feature Selection  Feature Extraction:  Data Preprocessing  Handling Missing Values
Interpolation
Interpolation is a fundamental concept in numerical analysis and data science, used to estimate values between known data
points. Imagine you have a set of scattered points on a graph, and you want to draw a smooth line or curve that passes through
or near all these points. This process of filling in the gaps between known data points is called interpolation.

• Why interpolate?

1. Discrete data: In practice, we usually have access only to discrete data points, not the full continuous function.

2. Approximation: Interpolation allows us to estimate function values between known data points.

3. Computational efficiency: A simple representation (like piecewise linear) can be more efficient to compute with than a
complex underlying function.

4. Noise reduction: If the original data contains noise, interpolation can help smooth it out.
Fig1: The true function we're trying to approximate Fig:3:The step function interpolation

The step shape denotes the output after interpolation

Fig 2: The datapoints available from the function
Conclusion: Not a good interpolation
Types
1. Step function interpolation(refer Fig3)

2. Linear Interpolation with Derivative Matching

• Uses straight lines between data points

• Lines are not necessarily horizontal

• Slopes match the first derivative of the function at each point

3. Continuous Piecewise Linear Interpolation

Linear Interpolation with Derivative Matching
• Linear segments between data points

• Lines meet at data points, ensuring continuity

• Continuous function, but may have discontinuous derivatives

4. Spline Interpolation

5. Radial Basis Function (RBF) interpolation Continuous Piecewise Linear Interpolation

Spline
A spline is a mathematical function used for interpolation and smoothing of data.

Definition:

1. A spline is a piecewise polynomial function

2. It's composed of multiple polynomial segments joined together at points called knots

1.Characteristics:

1. Smooth: Typically continuous up to a certain degree of derivative

2. Flexible: Can approximate complex shapes while maintaining smoothness

3. Local control: Changes in one segment don't significantly affect distant parts
1.Common types:

1. Linear splines: Use first-degree polynomials (straight lines) f(x)=mx+c

2. Cubic splines: Cubic splines are a powerful interpolation method used to create
smooth curves through a set of data points.

Piecewise cubic polynomials that connect data points

Each segment is a cubic function: f(x) = ax³ + bx² + cx + d

where a, b, c, and d are coefficients that are determined to ensure: The curve passes through
the data points

Use third-degree polynomials, very popular due to their balance of smoothness and
computational efficiency
RBF interpolation
It is a method of interpolation that uses radial basis functions to approximate the underlying data.

Uses a combination of radial basis functions centered at each data point to construct the interpolating function.

RBF (Radial Basis Function) interpolation is a method for estimating unknown values in a dataset. It works by:

1.Placing an RBF at each known data point

2.Adjusting the height of each RBF

3.Summing all RBFs to create a smooth surface

An RBF is a function whose value depends only on the distance from its center. The most common RBF is the Gaussian
(bell-shaped) curve.
Applications of Interpolation

 Image Processing: .
 Computer Graphics:
 Numerical Analysis:.
 Signal Processing:
 Mathematical Modeling: .
 Geographic Information Systems (GIS):.
 Audio Processing:
Support Vector Machine
Support Vector Machines (SVMs) are powerful and versatile machine learning algorithms used primarily for classification and
regression tasks.

Core Concept:
1. SVMs aim to find the optimal hyperplane that best separates different classes in the feature space.
2. For non-linearly separable data, SVMs use the "kernel trick" to map data into a higher-dimensional space where it
becomes linearly separable.

1.Key Components: a. Hyperplane: The decision boundary that separates classes.

Margin: The distance between the hyperplane and the nearest data points (support vectors).

Support Vectors: The data points closest to the hyperplane that define the margin.

1.Objective:

Maximize the margin between classes to achieve the best generalization.

Which is showing best separation of
classes?
w·x+b=0

•For binary classification:

•w · x + b > 0 classifies as one class (often labeled +1)
•w · x + b < 0 classifies as the other class (often labeled -1)
Support vectors are the data points that are closest to the
hyperplane and influence its position.
Support Vector Machine Terminology
Hyperplane: Hyperplane is the decision Margin: Margin is the distance between the support vector and
hyperplane.
boundary that is used to separate the data
The main objective of the SVM algorithm is to
points of different classes in a feature space.
maximize the margin.
In the case of linear classifications, it will be
The wider margin indicates better classification
a linear equation i.e. wx+b = 0.
performance.
Support Vectors: Support vectors are the
closest data points to the hyperplane, which Kernel: Kernel is the mathematical function, which is used
makes a critical role in deciding the in SVM to map the original input data points into highdimensional
feature spaces, so, that the hyperplane can be
hyperplane and margin.
easily found out even if the data points are not linearly
separable in the original input space.
Steps to Determine Linear Decision
Boundary using SVM
1. Data Preparation:
5. Formulate the Decision Boundary:The linear
•Prepare dataset with features (X) and binary
decision boundary is given by the equation:w · x + b
labels (y)
= 0.
2. Solve Optimization Problem:
Classification of New Points: For a new point
Objective: Maximize the margin between
x:If w · x + b > 0, classify as positive class
classes
If w · x + b < 0, classify as negative class
Constraints: Ensure correct classification of
•Evaluate and Refine:
training points
•Test model and adjust if necessary
•3. Identify Support Vectors:
These points define the decision boundary
4. Determine the Bias Term (b):Use any
support vector (xₛ, yₛ).
b = yₛ - w · xₛ
Advantages and Disadvantages of SVM
•Effective in High-Dimensional Spaces: •Not Suitable for Large Datasets: Training time
Performs well even when the number of can be high for large datasets
dimensions exceeds the number of samples
•Sensitive to Noisy Data: Performs poorly with
•Memory Efficient: Uses a subset of training overlapping classes
points (support vectors) in the decision
function •No Probabilistic Explanation: Does not directly
provide probability estimates
•Versatile: Different kernel functions can be
specified for various decision functions •Kernel Selection Can Be Challenging:
Choosing the right kernel and tuning
•Works Well with Clear Margin of Separation: parameters can be complex
Highly effective when there's a clear margin of
separation between classes •Interpretability Issues: Especially with non-
linear kernels, the model can be hard to
• Robust to Overfitting: Especially in high- interpret
dimensional spaces due to regularization

3 Month Roadmap To 30 Lacs - Yr - Pratham Kohli
No ratings yet
3 Month Roadmap To 30 Lacs - Yr - Pratham Kohli
4 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Topic 7
No ratings yet
Topic 7
33 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Multi-Layer Perceptron & Backpropagation
No ratings yet
Multi-Layer Perceptron & Backpropagation
88 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
04 NeuralNetworksII
No ratings yet
04 NeuralNetworksII
74 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
34 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
ML Unit 2 Lecture Notes
No ratings yet
ML Unit 2 Lecture Notes
20 pages
Soft Computing Unit 2
No ratings yet
Soft Computing Unit 2
22 pages
Artificial Neural Networks - MLP
No ratings yet
Artificial Neural Networks - MLP
52 pages
Chapter 10: Artificial Neural Networks
No ratings yet
Chapter 10: Artificial Neural Networks
17 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Lecture 2
No ratings yet
Lecture 2
52 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
38 pages
Unit2ml 230101150634 5590aaef
No ratings yet
Unit2ml 230101150634 5590aaef
202 pages
Unit 3
100% (1)
Unit 3
11 pages
MLP Lecture 4
No ratings yet
MLP Lecture 4
35 pages
ML Unit 2
No ratings yet
ML Unit 2
24 pages
Back Propagation
100% (1)
Back Propagation
27 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
ML Module 2 New
No ratings yet
ML Module 2 New
36 pages
Session-33 - 1 CO4 Forward NN
No ratings yet
Session-33 - 1 CO4 Forward NN
15 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
Neural Network Learning Guide
No ratings yet
Neural Network Learning Guide
43 pages
Graph Theory Report
No ratings yet
Graph Theory Report
9 pages
Pr3 ANN WriteUp
No ratings yet
Pr3 ANN WriteUp
8 pages
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
No ratings yet
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
20 pages
UNIT-2 Machine Learning
No ratings yet
UNIT-2 Machine Learning
35 pages
Backpropagation in MLP: A Detailed Guide
No ratings yet
Backpropagation in MLP: A Detailed Guide
34 pages
Lecture 5 ANN NLP
No ratings yet
Lecture 5 ANN NLP
85 pages
Unit 1
No ratings yet
Unit 1
72 pages
Neural
No ratings yet
Neural
53 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
42 pages
4 Perceptron 06 08 2025
No ratings yet
4 Perceptron 06 08 2025
32 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Neural Network
100% (1)
Neural Network
54 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
15 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
20 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
30 pages
Multi Layer Perceptron 1
No ratings yet
Multi Layer Perceptron 1
54 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Neural Network: Prof. Subodh Kumar Mohanty
No ratings yet
Neural Network: Prof. Subodh Kumar Mohanty
37 pages
ML Unit 5
No ratings yet
ML Unit 5
34 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Module 5 Lecture 2
No ratings yet
Module 5 Lecture 2
45 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Neural Network
No ratings yet
Neural Network
97 pages
A Survey of Decision Trees Concepts Algorithms and Applications
No ratings yet
A Survey of Decision Trees Concepts Algorithms and Applications
12 pages
MOPSO
No ratings yet
MOPSO
7 pages
Mm597 Advanced Numerical Methods in Engineers
No ratings yet
Mm597 Advanced Numerical Methods in Engineers
142 pages
Python Code
No ratings yet
Python Code
5 pages
Advanced Linear Programming Guide
No ratings yet
Advanced Linear Programming Guide
8 pages
ChatGPT - Insertion in Red-Black Tree
No ratings yet
ChatGPT - Insertion in Red-Black Tree
22 pages
Introduction to Optimization Notes
No ratings yet
Introduction to Optimization Notes
43 pages
Good Question DSP
No ratings yet
Good Question DSP
36 pages
Losses
No ratings yet
Losses
9 pages
Simplified Successive-Cancellation List Decoding of Polar Codes
No ratings yet
Simplified Successive-Cancellation List Decoding of Polar Codes
22 pages
VL7101 VLSI Signal Processing Lesson Plan
No ratings yet
VL7101 VLSI Signal Processing Lesson Plan
3 pages
File Rounding Worksheet 4 4th Grade 1615900984 PDF
No ratings yet
File Rounding Worksheet 4 4th Grade 1615900984 PDF
5 pages
Algorithm Analysis and Design: Divide & Conquer and Greedy Strategy
No ratings yet
Algorithm Analysis and Design: Divide & Conquer and Greedy Strategy
80 pages
Human Evolutionary Algorithm Insights
No ratings yet
Human Evolutionary Algorithm Insights
25 pages
Module 3 Topic 3 Lesson 2B Weighted Graphs PDF
No ratings yet
Module 3 Topic 3 Lesson 2B Weighted Graphs PDF
14 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
Algorithm Trading Using Q-Learning and Recurrent Reinforcement Learning PDF
No ratings yet
Algorithm Trading Using Q-Learning and Recurrent Reinforcement Learning PDF
7 pages
Program No. 1 AIM: Write A Program To Implement Array Traversal
No ratings yet
Program No. 1 AIM: Write A Program To Implement Array Traversal
37 pages
DTFT Vs DFT
No ratings yet
DTFT Vs DFT
3 pages
CH-6a Noise Effect On AM Systems
No ratings yet
CH-6a Noise Effect On AM Systems
75 pages
Digital Control System
100% (1)
Digital Control System
11 pages
DC Unit 5
No ratings yet
DC Unit 5
7 pages
Genetic Algorithms: PHY 604: Computational Methods in Physics and Astrophysics II
No ratings yet
Genetic Algorithms: PHY 604: Computational Methods in Physics and Astrophysics II
31 pages
C Programming Function Examples
No ratings yet
C Programming Function Examples
3 pages
AI Game Playing Seminar Overview
No ratings yet
AI Game Playing Seminar Overview
27 pages
Zhou 2012
No ratings yet
Zhou 2012
10 pages
11 (1) Merged
No ratings yet
11 (1) Merged
12 pages
The Scalar Kalman Filter
100% (4)
The Scalar Kalman Filter
16 pages
Course Details
No ratings yet
Course Details
1 page

Module 2 Notes - Full

Uploaded by

Module 2 Notes - Full

Uploaded by

Module 2

•Output: The network's prediction for the given input data.

4. A sum-of-squares error function is introduced

6. The (1/2) factor in the error function simplifies differentiation.

The error is fed backwards through the network in order to

• and then afterwards, the first-layer weights

– initialise all weights to small random values

∗ for each input vector:

•It goes to the output neurons with activation function

(yk - tk) is the error term

i) Amount of training data

Stopping Criteria: Simple methods like setting a fixed number of iterations

Validation Set: A separate dataset used to monitor the network's

Taking partial derivative

Error term or Delta term,

Hidden node contributes to the activation of all of

Neuron Firing Behavior: If the input x is near to the

A Radial Basis Function is a real-valued function

This graph illustrates a Gaussian Radial Basis Function:

. Basis Functions (RBFs) are a special category

Step 2: Calculate the actions of the RBF nodes using

The step shape denotes the output after interpolation

2. Linear Interpolation with Derivative Matching

• Uses straight lines between data points

• Lines are not necessarily horizontal

• Slopes match the first derivative of the function at each point

3. Continuous Piecewise Linear Interpolation

• Lines meet at data points, ensuring continuity

• Continuous function, but may have discontinuous derivatives

5. Radial Basis Function (RBF) interpolation Continuous Piecewise Linear Interpolation

1. A spline is a piecewise polynomial function

1. Smooth: Typically continuous up to a certain degree of derivative

2. Flexible: Can approximate complex shapes while maintaining smoothness

1. Linear splines: Use first-degree polynomials (straight lines) f(x)=mx+c

Piecewise cubic polynomials that connect data points

Each segment is a cubic function: f(x) = ax³ + bx² + cx + d

1.Placing an RBF at each known data point

2.Adjusting the height of each RBF

3.Summing all RBFs to create a smooth surface

1.Key Components: a. Hyperplane: The decision boundary that separates classes.

Maximize the margin between classes to achieve the best generalization.

•For binary classification:

You might also like