Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views25 pages

Unit 4 Iml Introduction To Machine Learning

The document provides an introduction to Support Vector Machines (SVM), a supervised machine learning algorithm used for classification and regression. It explains the concepts of linear and non-linear datasets, the role of support vectors, and the importance of maximizing the margin between classes. Additionally, it discusses linear soft margin classifiers, kernel-induced feature spaces, and the Perceptron algorithm as foundational concepts in machine learning.

Uploaded by

chanda anusha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

Unit 4 Iml Introduction To Machine Learning

The document provides an introduction to Support Vector Machines (SVM), a supervised machine learning algorithm used for classification and regression. It explains the concepts of linear and non-linear datasets, the role of support vectors, and the importance of maximizing the margin between classes. Additionally, it discusses linear soft margin classifiers, kernel-induced feature spaces, and the Perceptron algorithm as foundational concepts in machine learning.

Uploaded by

chanda anusha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

lOMoARcPSD|4030470

Unit-4 IML - Introduction to Machine learning

Computer Science (Jawaharlal Nehru Technological University, Hyderabad)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by haritha thollamadugu ([email protected])
lOMoARcPSD|4030470

UNIT-IV

Support Vector Machines (SVM)

Support Vector Machines (SVMs) are a powerful and versatile class of supervised machine
learning algorithms used for classification and regression tasks. At its core, SVM aims to find an
optimal hyperplane that best separates data points into distinct classes. This hyperplane is chosen
to maximize the margin between the two classes, meaning it should be positioned in such a way
that it maximally separates the closest data points from each class. These closest data points are
known as "support vectors." By focusing on support vectors, SVMs achieve better generalization
to new, unseen data.

Introduction to Linear and Non linear data sets


Linear Data Sets
A linear dataset is one in which the relationship between the input features and the target variable
can be approximated or modeled by a straight line or a hyperplane. Linear datasets are
straightforward to work with because the underlying patterns are relatively simple and can be
captured by linear models. Here's an example:

Example: Predicting House Prices

 Suppose you want to predict house prices based on features like the number of bedrooms,
square footage, and age of the house. If the relationship between these features and the
house price can be accurately represented by a linear equation like

House Price = (Number of Bedrooms * w1) + (Square Footage * w2) + (Age of the
House * w3) + b

 In this case, the dataset is considered linear. Linear regression is a common algorithm
used for such problems.
 If you were to plot the data points on a graph, they would form a clear straight line or a
hyperplane that approximates the relationship between the features and the target
variable.

Non-Linear Data Sets


A non-linear dataset is one in which the relationship between the input features and the target
variable cannot be effectively modeled by a straight line or a hyperplane. Non-linear datasets
often exhibit complex, curved, or irregular patterns that require more sophisticated models to
capture. Here's an example

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Example: Predicting a Student's Exam Score

 Suppose you want to predict a student's exam score based on the number of hours they
studied and the number of hours they slept the night before. The relationship between
these features and the exam score might not be linear.
 It could be that studying more hours initially increases the score, but after a certain point,
further studying becomes less effective. Also, the quality of sleep might interact with the
effect of studying.
 These relationships can be highly non-linear and are better captured by non-linear
models like decision trees, random forests, or support vector machines with non-linear
kernels.
 In this case, plotting the data points would result in a curve or a complex shape that
doesn't fit a straight line or a simple plane.

Understanding whether your data is linear or non-linear is crucial for selecting the
appropriate machine learning model and algorithm. Linear regression, for example, is ideal for
linear datasets, while non-linear datasets require more advanced techniques to capture the
underlying patterns effectively.

Support Vector Machine (SVM)


Support Vector Machine (SVM) is a powerful and versatile machine learning algorithm used for
classification and regression tasks. Its primary goal is to find an optimal hyperplane that best
separates different classes of data points in a high-dimensional space. SVM is particularly useful
when dealing with both linearly and non-linearly separable datasets. Here's a brief introduction
to SVM with relevant diagrams.

1. Intuition:

At its core, SVM aims to find a hyperplane that maximizes the margin between two classes of
data points. This hyperplane is known as the decision boundary, and the margin represents the
distance between the boundary and the nearest data points from each class. The idea is to ensure
that the decision boundary has the greatest separation, making it robust to classify new, unseen
data accurately.

2. Linear Separation

For linearly separable data, SVM finds a hyperplane that separates the two classes with the
largest possible margin. In a two-dimensional feature space, this hyperplane is a straight line.
Here's a simplified diagram to illustrate this concept:

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

The solid line is the decision boundary, and the dashed lines represent the margin.

3. Support Vectors

Support vectors are the data points closest to the decision boundary. These points play a critical
role in defining the margin and the overall performance of the SVM. The margin is determined
by the distance between the support vectors and the decision boundary, as shown below:

4. Non-linear Separation

In many real-world scenarios, data may not be linearly separable. SVM can handle such cases by
mapping the data into a higher-dimensional space where it becomes linearly separable. This
transformation is done using a kernel function. Common kernel functions include the linear,

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

polynomial, and radial basis function (RBF) kernels. The following diagram demonstrates this
concept:

In this example, data is transformed into a 3D space where it becomes linearly separable.

In summary,

Support Vector Machine is a versatile machine learning algorithm that excels in both linear and
non-linear classification tasks. It aims to find the optimal hyper plane that maximizes the margin
between classes, and it uses support vectors to define this margin. By introducing kernel
functions, SVM can handle non-linear separations, making it a valuable tool in various
applications, from image classification to financial analysis.

SVMs advantages:

 They provide strong theoretical foundations, making them a preferred choice for many
researchers and practitioners.
 They are effective with high-dimensional data, relatively resistant to overfitting, and can
handle both binary and multiclass classification problems.
 Moreover, SVMs are also used in regression tasks, known as Support Vector Regression
(SVR), where they aim to fit a hyperplane that approximates the target values as closely
as possible.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Linear Discriminant functions for Binary classification using SVM:


Linear discriminant functions are essential in binary classification tasks, and Support Vector
Machines (SVMs) utilize them to separate data into two distinct classes.

Let's explain this concept using simple equations and diagrams

In SVM, the linear discriminant function for binary classification can be represented as

f(x) =sign (w⋅x+b)


Where

f(x) : The decision function that determines the class label of a data point x.

w: The weight vector that defines the orientation of the decision boundary.

x: The feature vector representing the input data point.

b: The bias or threshold value that shifts the decision boundary.

sign(.): The sign function, which assigns a class label based on the sign of the result.

Example

Let's illustrate this with a simple example. Suppose we have a 2D feature space for binary
classification, where class A is represented by Blue points and class B is represented by Black
points. The linear discriminant function is:

f(x) =sign (w1x1+w2x2+b)


Where

w1 and w2 are the weights that define the orientation of the decision boundary.

x1 and x2 are the features of a data point.

b is the bias term.

Here's a diagram

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

In the diagram, the decision boundary is a straight line, represented by

w1 x1 + w2 x2 + b = 0

That separates the two classes. Any data point x falling on one side of the decision boundary is
classified as Class A, and on the other side as Class B.

 SVM's objective is to find the optimal values of w and b that maximize the margin
between the two classes while minimizing classification errors.
 The support vectors are the data points that are closest to the decision boundary, and they
are used to define the margin in an SVM.
 The margin is the perpendicular distance from the decision boundary to the support
vectors.
 SVM aims to maximize this margin while ensuring that data points are correctly
classified.
 If the data points are not linearly separable, SVM can use kernel functions to map the
data to a higher-dimensional space where linear separation is possible.

Large margin classifier for linearly separable data using SVM:


A key concept in Support Vector Machines (SVM) is the idea of a large margin classifier for
linearly separable data. The goal is to find a hyperplane that maximizes the margin between two
classes of data points. Here's an explanation with simple equations and diagrams:

 Consider a binary classification problem where you have two classes: Class A and Class
B. The objective of SVM is to find a hyperplane that best separates these two classes
while maximizing the margin. The equation for this hyperplane is

w⋅x+b = 0
Where

W: The weight vector that defines the orientation of the hyperplane.

x: The feature vector representing an input data point.

b: The bias or threshold value that shifts the hyperplane.

Maximizing the Margin:

 The margin is the distance between the hyperplane and the nearest data points from each
class. To maximize this margin, we want to find w and b such that the distance from the
hyperplane to the closest point in Class A and the closest point in Class B is maximized.
Mathematically, this can be represented as:

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

2
Margin =
||w||

Where ||w|| is the Euclidean norm (magnitude) of the weight vector w. The objective of SVM is
to maximize this margin.

Here’s diagrams illustrating this concept

In the diagram, the decision hyperplane (the straight line) separates Class A from Class B. The
margin is the distance from the hyperplane to the closest data points from each class.

 SVM's objective is to find the optimal w and b that maximize this margin while ensuring
that data points are correctly classified.
 In this ideal scenario of linearly separable data, the support vectors are the data points
closest to the hyperplane, and they are used to define the margin.
 SVM finds these support vectors and optimizes the margin by solving a constrained
optimization problem.

The large margin classifier provides a robust solution for linearly separable data, ensuring a
wider separation between classes and making it less sensitive to noise in the data.

Fig: Representing small and large margin at SVM classifier model

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Fig: Representing good and bad SVM classifier models in small and large margin cases

A large margin classifier in SVM for linearly separable data aims to find an optimal hyperplane
that maximizes the margin between two classes, ensuring a robust separation. Support vectors
define this margin, and SVM finds the best hyperplane by minimizing classification errors while
maximizing the margin, enhancing classification accuracy and robustness.

Linear soft margin classifier for over lapping classes


A linear soft margin classifier in SVM addresses overlapping classes by allowing for some
classification errors. It introduces a penalty for misclassifications, striking a balance between
maximizing the margin and tolerating some errors.

Linear Soft Margin Classifier:

 The linear soft margin classifier in SVM aims to find a hyperplane that best separates
overlapping classes, even when perfect separation isn't possible. It introduces a "slack
variable" (ξ) to account for classification errors. The objective function is modified as
follows
1
min w ,b ∥w∥2+C ∑𝑛𝑖=1 ξi
2

Subject to:

yi(w⋅xi+b) ≥ 1−ξi ; ξi≥0


Where

w: The weight vector.

b: The bias or threshold.

C: A hyper parameter that controls the trade-off between maximizing the margin and minimizing
the misclassification error.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

ξi: Slack variable for the ith data point.

n: The number of data points.

yi: The class label of the ith data point.

A simple diagram illustrating the concept of a linear soft margin classifier:

Class 1 is represented by points labeled as -1.

Class 2 is represented by points labeled as +1.

 The decision hyperplane (a straight line) attempts to separate the classes, but due to
overlapping, some data points may lie on the wrong side.
 The slack variables (ξ) allow for some misclassifications while trying maximizing the
margin. The parameter C controls the balance between minimizing errors (small C) and
maximizing the margin (large C).
 It helps SVM adapt to overlapping classes and create a margin that balances the trade-off
between classification accuracy and margin size.

Kernel-induced feature spaces (Nonlinear classifier)


Kernel-induced feature spaces are a critical concept in Support Vector Machines (SVM) that
allows SVM to handle non-linearly separable data by implicitly transforming it into a higher-
dimensional space where it might become linearly separable.

 In SVM, the kernel function plays a central role. The kernel function, denoted as K(x, y),
takes two input data points x and y and returns a measure of similarity between them.
 It implicitly maps the data into a higher-dimensional feature space where linear
separation might be possible.

The equation for SVM's decision boundary in the feature space is:

f(x) =sign ( ∑𝑛𝑖=1 αi yi K (xi, x) + b )

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Where

f(x): The decision function.


αi: Lagrange multipliers determined during the SVM optimization.

yi: Class labels of the data points.

K (xi, x): The kernel function that maps xi and x into the feature space

b: The bias or threshold.

 Consider a simple 2D dataset where Class A (Green points) and Class B (blue points) are
not linearly separable in the original feature space:

Fig: Non linear separable data using 2D Space

 In this diagram, it's evident that a straight-line decision boundary cannot separate the
classes effectively in the original 2D space.
 Now, by using a kernel function, we implicitly map this data to a higher-dimensional
feature space, often referred to as a "kernel-induced feature space." Let's say we use a
radial basis function (RBF) kernel
 This RBF kernel implicitly maps the data to a higher-dimensional space where the classes
might become linearly separable

Fig: Non linear separable data using 3D Kernel Space and 2D Space

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

 In this new feature space, the data points might be linearly separable with the right choice
of kernel and kernel parameters, enabling SVM to find an optimal decision boundary that
maximizes the margin between classes.

The transformation into the kernel-induced feature space is implicit and doesn't require explicit
calculation of the transformed feature vectors. It allows SVM to handle non-linearly separable
data effectively.

Perceptron Algorithm:
The Perceptron algorithm is a simple supervised learning algorithm used for binary
classification. It's designed to find a linear decision boundary that separates data points of two
classes. Here's an explanation of the Perceptron algorithm with suitable diagrams:

The Perceptron algorithm works as follows:

1. Initialize the weights (w) and bias (b) to small random values or zeros.

2. For each data point (x) in the training dataset, compute the predicted class label (ŷ) using the
following formula

ŷ =sign (w⋅x+b)
Here, w represents the weight vector, x is the feature vector of the data point, and sign (.) is a
function that returns +1 for values greater than or equal to zero and -1 for values less than zero.

3. Compare the predicted class label (ŷ) to the true class label (y) of the data point. If they don't
match, update the weights and bias as follows:

w = w+α⋅ (y− ŷ)⋅x

b=b+α⋅ (y− ŷ)

Here, (α) is the learning rate, and (y − ŷ) is the classification error. These updates help the
Perceptron adjust the decision boundary to classify the data points correctly.

4. Repeat the above steps for a fixed number of iterations or until the algorithm converges,
meaning no more misclassifications occurs.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Fig: Perceptron

Let's illustrate the Perceptron algorithm with a simple 2D dataset and a linear decision boundary

Fig: Linear decision boundary using Perceptron algorithm

 Suppose you have a 2D feature space with two classes, Class 1 (labeled as -1) and Class 2
(labeled as +1), and you want to separate them using the Perceptron algorithm.
 In the initial state, the decision boundary (a straight line) is randomly placed. The
Perceptron algorithm starts making predictions and adjusting the decision boundary based
on classification errors.
 As it iterates through the data points, it gradually shifts the decision boundary to correctly
classify the points into their respective classes.
 The process continues until no misclassifications remain, or a maximum number of
iterations are reached. The final decision boundary separates the two classes effectively
 The Perceptron algorithm finds a linear decision boundary that minimizes classification
errors and correctly classifies the data points based on the training data. It's a basic
algorithm suitable for linearly separable datasets and serves as the foundation for more
complex neural networks.

The key difference between the Perceptron and SVM is that SVM aims to find the optimal
hyperplane that maximizes the margin, whereas the Perceptron algorithm doesn't consider
margin maximization. SVM is a more sophisticated and powerful classification algorithm,
especially suitable for scenarios where data may not be perfectly separable and a margin is
essential for generalization and reducing over fitting.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Regression by SVM (Super Vector Machine):


Support Vector Machines (SVM) are not just limited to classification tasks; they can also be used
for regression. In regression tasks, the goal is to predict a continuous target variable rather than
class labels. SVM for regression is known as Support Vector Regression (SVR).it is classified
into two models.

1. Linear regression by SVM


2. Non-Linear Regression by SVM

Linear regression by SVM:

Linear regression using Support Vector Machines (SVM) is a variation of SVM designed for
regression tasks. It aims to find a linear relationship between input features and a continuous
target variable.

 In linear regression using SVM, the goal is to find a linear function that best
approximates the relationship between input features and the target variable. This linear
function is represented as:

f(x) = w⋅x+b
f(x): The predicted target variable.

w: The weight vector.

x: The feature vector representing the input data point.

b : The bias or intercept term.

 The linear regression objective is to minimize the mean squared error (MSE) between the
predictions and the true target values
1
min w ,b ∥w∥2+C ∑𝑛𝑖=1 (yi−f(xi))2
2

yi: The true target variable of the ith data point.

C: A regularization parameter controlling the trade-off between fitting the data and
keeping the model simple.

 The target variable (y) is represented on the vertical axis, and the input features (x) are on
the horizontal axis.
 The linear function f(x) = w.x + b is the best-fitting line that minimizes the mean squared
error by adjusting the weight vector (w) and the bias term (b).

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

 This linear model can be used for regression tasks to predict continuous target variables
based on input features.

Non-Linear Regression by SVM:

Non-linear regression by Support Vector Machines (SVM) uses the principles of SVM to model
non-linear relationships between input features and a continuous target variable. The key idea is
to use kernel functions to implicitly map the data into a higher-dimensional space, where a linear
regression model can be applied effectively

 In non-linear regression using SVM, the goal is to find a non-linear function that best fits
the relationship between input features and the target variable.
 Unlike linear regression, which assumes a linear relationship, non-linear regression
allows for more complex, non-linear patterns.
 The non-linear regression objective is to minimize the mean squared error (MSE)
between the predictions and the true target values:
1
min w ,b ∥w∥2+C ∑𝑛𝑖=1 (yi−f(xi))2
2

yi: The true target variable of the ith data point.

C: A regularization parameter controlling the trade-off between fitting the data


and keeping the model simple.

To account for non-linearity, the non-linear function is represented as

f(x)= ∑𝑛𝑖=1 αi K(xi, x)+b


f(x): The predicted target variable.

αi: Lagrange multipliers determined during the SVM optimization.

K(xi, x): The kernel function that implicitly maps xi and x into a higher-dimensional
feature space.

b : The bias or intercept term.

 The target variable (y) is represented on the vertical axis, and the input features (x) are on
the horizontal axis.
 The non-linear function f(x)= ∑𝑛 𝑖=1 αi K(xi, x)+b captures non-linear relationships
between input features and the target variable by implicitly mapping the data into a
higher-dimensional feature space using the kernel function.
 The model can then make non-linear predictions based on the input features.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Learning with neural networks toward cognitive machines:


Learning with neural networks toward cognitive machines represents an approach to developing
intelligent systems that can reason, learn, and adapt like human cognition.

 Cognitive machines are designed to emulate human-like thinking and problem-solving


processes. Here's an overview of how learning with neural networks contributes to the
development of cognitive machines

1. Neural Networks as a Foundation

Neural networks, particularly deep learning models like convolutional neural networks (CNNs)
and recurrent neural networks (RNNs), serve as the foundation for cognitive machine learning.
These models are capable of handling complex data, learning patterns, and making predictions.

2. Supervised and Unsupervised Learning:

Cognitive machines employ both supervised and unsupervised learning techniques. In


supervised learning, neural networks are trained on labeled data, while unsupervised learning
allows them to discover hidden patterns and structures in data.

3. Reinforcement Learning:

Cognitive machines also incorporate reinforcement learning, enabling them to learn through
interactions with their environment. Agents learn by receiving rewards and penalties based on
their actions, enabling them to make decisions and adapt over time.

4. Transfer Learning:

To mimic cognitive abilities, neural networks use transfer learning. Pre-trained models are
fine-tuned for specific tasks, which is akin to humans applying knowledge learned in one context
to solve related problems.

5. Multimodal Data Processing:

Cognitive machines process data from various sources (text, images, audio) simultaneously,
fostering a more comprehensive understanding of the environment. They can analyze multiple
data modalities to make informed decisions.

6. Memory and Reasoning:

Cognitive machines integrate memory networks and reasoning modules, enabling them to store
and retrieve information and perform logical reasoning. This allows them to solve problems by
considering context and past experiences.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

7. Natural Language Understanding and Generation:

Cognitive machines excel in natural language processing tasks. They can understand and
generate human-like text and engage in meaningful conversations, making them highly
interactive and adaptive.

8. Contextual Awareness:

These machines have contextual awareness, recognizing the importance of the context in which
they operate. They can adapt their behavior, decisions, and responses based on the current
situation.

9. Continuous Learning:

Cognitive machines don't stop learning after initial training. They engage in continuous
learning and self-improvement, allowing them to adapt to changing conditions and acquire new
knowledge over time.

10. Emulating Human Cognition:

The ultimate goal of learning with neural networks toward cognitive machines is to create
systems that replicate and augment human-like cognition. They mimic human problem-solving,
decision-making, creativity, and adaptability.

In summary, learning with neural networks toward cognitive machines involves a holistic
approach to developing intelligent systems. By combining various learning techniques, these
machines can process complex data, reason, understand language, adapt to changing situations,
and replicate cognitive functions, bringing us closer to creating intelligent systems that emulate
human cognition and understanding.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Neuron Models:
Let us discuss two neuron models

1. Biological neuron
2. Artificial neuron

1. Biological neuron:

Fig: Biological Neuron Structure

 Neuron Structure:

A typical human neuron consists of three main parts:

Cell Body (Soma): The cell body contains the nucleus and other organelles.

Dendrites: These are the branched extensions that receive signals from other neurons.

Axon: The axon is a long, slender extension that transmits signals to other neurons or
cells.

 Synapses:

Neurons communicate with each other through synapses, which are small gaps between
the axon of one neuron and the dendrites of another. Neurotransmitters are released at the
synapse to transmit signals.

 Action Potential:

Neurons transmit electrical signals in the form of action potentials. An action potential is
a brief change in the neuron's electrical charge, leading to the propagation of a signal
along the axon.

 Resting Potential:

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Neurons maintain a resting potential, which is a difference in electrical charge across the
cell membrane. It is around -70 mill volts and is essential for neural signaling.

 Threshold and Firing:

When the electrical charge inside the neuron reaches a certain threshold, an action
potential is initiated. This action potential travels down the axon and signals the release
of neurotransmitters at the synapse.

 Excitatory and Inhibitory Neurons:

Neurons can be classified as either excitatory or inhibitory. Excitatory neurons promote


action potentials, while inhibitory neurons reduce the likelihood of an action potential.

 Neural Networks:

Neurons are interconnected in complex networks. These networks allow for information
processing, learning, and memory formation.

2. Artificial Neuron:

Fig: Artificial Neuron

 Inputs (x1, x2... xn):


Artificial neurons receive multiple input signals, each associated with a weight (w1, w2...
wn). These inputs represent the features of the data being processed.

 Weights (w1, w2... wn):


Each input signal is multiplied by a weight. The weights determine the importance of
each input in the neuron's computation.

 Summation (Σ):
The weighted inputs are summed together, typically with a bias term (b), to compute the
net input:

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Net Input= (w1∗x1) + (w2∗x2) +...+ (wn∗xn) +b

 Activation Function (f):


The net input is passed through an activation function, such as the sigmoid, ReLU, or
tanh function. The activation function introduces non-linearity to the model. Common
choices include the sigmoid function
1
Output = f (Net Input) =
1+𝑒 −𝑁𝑒𝑡 𝐼𝑛𝑝𝑢𝑡

 Output (y):
The result of the activation function is the output of the artificial neuron. It represents the
neuron's response to the input signals.

Neural Network Architectures:

In the biological brain, a huge number of neurons are interconnected to form the network and
perform advanced intelligent activities. The artificial neural network is built by neuron models.
Many different types of artificial neural networks have been proposed, just as there are many
theories on how biological neural processing works. We may classify organization of the neural
networks into two types. They are

1. Single layer neural networks


2. Multilayer neural networks

Single layer neural networks:

A single-layer neural network, also known as a single-layer Perceptron, is the simplest neural
network architecture. It consists of an input layer, which directly connects to an output layer,
without any hidden layers. Single-layer networks are mainly used for binary classification
problems or linearly separable tasks.

Fig: Single layer Artificial Neural Network

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

 Weighted Sum (z):

The weighted sum of input features is computed as follows

z = w1. x1 + w2 . x2 + ……+ wn. xn + b

Where:

z is the weighted sum.

x1, x2, …., xn are the input features.

w1, w2… wn are the corresponding weights.

b is the bias.

 Activation Function (f(z)):

A step function, also known as the Heaviside step function, is often used as the activation
function. It outputs 1 if the weighted sum z is greater than or equal to 0, and 0 otherwise

f(z) = 1 if z ≥ 0

0 if z < 0

 In the diagram, input features (x1, x2... xn) are connected to the weighted sum calculation,
followed by the activation function (step function), which produces a binary output (0 or
1).
 This single-layer neural network can make binary decisions based on the weighted sum
of its input features, which is often used for linearly separable classification problems.
 Single-layer networks are limited in their capability compared to more complex neural
architectures like multi-layer perceptrons (MLPs) or deep neural networks.
 They can only solve problems that are linearly separable and cannot capture complex
non-linear relationships in data. While simple, they are foundational in understanding
neural networks and are a starting point for more sophisticated architectures. To handle
more complex tasks, deeper neural networks with hidden layers are employed.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Multilayer neural networks:

Multi-layer neural networks, often referred to as multi-layer perceptrons (MLPs), are a type of
artificial neural network with multiple layers of interconnected neurons. These networks are
designed to handle more complex tasks by introducing hidden layers between the input and
output layers.

Fig: Multilayer artificial neural network

 Weighted Sum (z) in a Hidden Layer:

The weighted sum for each neuron in a hidden layer is calculated as follows:

zj=∑𝑛𝑖=1 wij⋅xi + bj
Where

zj is the weighted sum for neuron j in the hidden layer.

wij is the weight connecting input I to neuron j.

xi is the input from the previous layer.

bj is the bias for neuron j.

 Activation Function (f(z)) for Hidden Layers:

Common activation functions for hidden layers include the sigmoid, ReLU, or tanh
functions

f(z) = Activation function (z)

 Weighted Sum (z) in the Output Layer:

The weighted sum for each neuron in the output layer is calculated similarly to the hidden
layer zk= ∑𝑚
𝑖=1 wkj′⋅f(zj)+bk′

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

Where

zk is the weighted sum for neuron k in the output layer.

w'kj is the weight connecting neuron j in the hidden layer to neuron k in the output layer.

f(zj) is the output of neuron j in the hidden layer.

b'k is the bias for neuron k in the output layer.

 Activation Function (f(z)) for Output Layer:

The activation function in the output layer depends on the type of problem. For binary
classification, you might use a sigmoid function. For multiclass classification, a softmax
function is common.

In this diagram, input features (x1, x2... xn) are connected to the weighted sum
calculations in the hidden layer, followed by the activation function for the hidden layer. The
output of the hidden layer is then connected to the weighted sum calculations in the output layer,
followed by the activation function for the output layer. This network structure allows multi-
layer neural networks to capture complex relationships and solve a wide range of tasks, including
classification, regression, and more.

Linear neuron and the widrow-Hoff Learning Rule:

A linear neuron, also known as a McCulloch-Pitts neuron or a threshold neuron, is a simplified


model of a biological neuron that computes a weighted sum of its inputs and compares it to a
threshold to produce an output.

Linear Neuron:

 Inputs (x1, x2... xn): A linear neuron takes multiple input values (x1, x2... xn). Each input is
associated with a weight (w1, w2... wn), which represents the importance of that input.
 Weighted Sum (z): The weighted sum of inputs is computed as

Z = w1∗x1+w2∗x2+...+wn∗xn

 Threshold (θ): The weighted sum is compared to a threshold (θ) to produce the output.
 Output (y): If the weighted sum z is greater than or equal to the threshold θ, the neuron's
output is 1. Otherwise, the output is 0.

y(z) = 1 if z ≥ θ

0 if z < θ

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

A linear neuron can be used for binary classification, where it acts as a simple decision-maker,
and the weights and threshold are adjusted to make correct classifications.

Widrow-Hoff Learning Rule:

The Widrow-Hoff learning rule, also known as the delta rule or the LMS (Least Mean Squares)
algorithm, is a supervised learning algorithm used to adjust the weights of a linear neuron to
minimize the error in classification or regression tasks. It updates the weights based on the
prediction error and the input values. The update rule for the ith weight is as follows

winew = wiold + α⋅(target−output)⋅xi


Where

Winew is the new weight.

wiold is the old weight.

α :is the learning rate, controlling the step size for weight updates.

target : is the desired output or target value.

Output :is the actual output of the neuron.

xi is the input associated with weight wi.

 The learning rule adjusts the weights in the direction that reduces the error. It continues to
update the weights in an iterative process until the error is minimized or converges to a
satisfactory level.

The Widrow-Hoff learning rule is a foundational concept in machine learning and neural
networks, providing a mechanism for training linear neurons to make accurate binary
classifications or predictions in a supervised learning context.

Error Correction Delta Rule:

The Error Correction Delta Rule, often referred to simply as the Delta Rule or the Delta Learning
Rule, is a supervised learning algorithm used to adjust the weights of artificial neurons in a
neural network, specifically in the context of supervised learning tasks. The primary goal of this
rule is to minimize the error between the actual output of the neuron and the desired target
output.

Components of the Error Correction Delta Rule:

 Actual Output (Y): This is the output produced by the artificial neuron or network based
on the current set of weights and inputs.

Downloaded by haritha thollamadugu ([email protected])


lOMoARcPSD|4030470

 Desired Target Output (D): This is the expected or correct output for the given input. It's
provided during the training phase.
 Error (E):The error is the difference between the actual output and the desired target
output:
E=D-Y

The Weight Update Rule:

The goal of the Error Correction Delta Rule is to adjust the weights to minimize the error (E).
The update for the ith weight wi is given by

winew = wiold + α⋅E⋅xi


Where

Winew is the new weight.

wiold is the old weight.

α :is the learning rate, controlling the step size for weight updates.

E is the error as calculated above

xi is the input associated with weight wi.

Weight Adjustment Process:

 Calculate the error (E) by taking the difference between the desired target output (D) and
the actual output (Y).
 Adjust each weight (wi) based on the weight update rule, considering the learning rate
(α).

This weight adjustment process is repeated iteratively for multiple data points during the training
process until the error converges to a satisfactory level, meaning that the difference between the
desired and actual outputs is minimized.

The Error Correction Delta Rule is a foundational concept in supervised learning for
neural networks. It's used to train the network by iteratively adjusting the weights to make the
network's predictions more accurate and aligned with the desired target outputs. The choice of
the learning rate is crucial, as it affects the speed and stability of the learning process.

Y.Naga Prasanthi
Assistant Professor
Department of ECE
DIET

Downloaded by haritha thollamadugu ([email protected])

You might also like