SC-mod3
Neural Network
Topic: Introduction to Neural Networks – Advent of Modern Neuroscience,
Classical AI and Neural Networks
1. Introduction to Neural Networks
A Neural Network (NN) is a computational paradigm inspired by the structure and functioning of the
human brain. Neural networks are a central component of soft computing, known for their capacity to
learn from data and adaptively improve over time.
They consist of layers of nodes (neurons), with each connection having a weight.
Neural networks are used in tasks like classification, regression, prediction, and pattern
recognition.
Neural networks fall under the broader category of machine learning algorithms and are capable of
function approximation, i.e., learning a mapping from input data to output.
2. Advent of Modern Neuroscience
The development of neural networks is deeply tied to progress in neuroscience, particularly in
understanding how the brain processes information.
Key Historical Milestones:
1943: McCulloch and Pitts Model
Proposed the first model of an artificial neuron using binary threshold logic.
This model laid the groundwork for connecting logic and neural computation.
1949: Hebbian Learning (Donald Hebb)
Suggested that connections between neurons strengthen when they are activated together:
“Cells that fire together, wire together.”
This concept influenced unsupervised learning and weight adjustment algorithms in ANNs.
1958: Perceptron Model (Frank Rosenblatt)
Introduced the perceptron, a single-layer neural network capable of binary classification.
Could learn weights using a supervised learning rule, known as the Perceptron Learning Rule.
1969: Minsky and Papert's Critique
Highlighted limitations of the perceptron, especially its inability to solve non-linearly separable
problems like XOR.
Temporarily stalled research in neural networks.
SC-mod3 1
1980s: Revival with Backpropagation
Development of the Backpropagation algorithm enabled training of multi-layer neural
networks.
Marked the beginning of modern neural network research.
3. Classical AI vs Neural Networks
Classical AI and neural networks are two fundamentally different paradigms for building intelligent
systems.
Classical AI:
Also known as symbolic AI or good old-fashioned AI (GOFAI).
Based on explicit rule-based systems, logic, and predefined knowledge bases.
Suited for problems with well-defined rules (e.g., theorem proving, games like chess).
Neural Networks (Connectionist AI):
Inspired by the human brain’s architecture and learning processes.
Focuses on learning from examples, rather than explicit rules.
Capable of handling ambiguity, noise, and incomplete data.
Feature Classical AI Neural Networks
Knowledge Representation Explicit (rules, logic) Implicit (learned weights)
Learning Mechanism Manual rule encoding Data-driven learning
Performance Rigid, struggles with uncertain data Flexible, generalizes from examples
Interpretability High (rules are transparent) Low (black-box nature)
Domains Logical reasoning, expert systems Vision, speech, language, robotics
4. Transition from Classical AI to Neural Networks
Classical AI dominated the early stages of AI research but struggled with real-world tasks
involving noisy, fuzzy, or incomplete data.
Neural networks gained prominence as they offered robust performance in such environments.
The shift was driven by the success of NNs in tasks like image recognition, speech processing,
and natural language understanding.
Biological Neurons and Artificial Neural Network; Model of Artificial
Neuron
Biological Neuron: The Inspiration
SC-mod3 2
The concept of artificial neural networks is inspired by the structure and function of biological
neurons, which are the fundamental units of the human nervous system.
Structure of a Biological Neuron:
Dendrites: Receive electrical signals (inputs) from other neurons.
Cell Body (Soma): Processes incoming signals and determines if the neuron should fire.
Axon: Transmits the electrical signal away from the cell body.
Axon Terminals (Synaptic Knobs): Pass the signal to the next neuron via synapses using
neurotransmitters.
Working Principle:
When a neuron receives enough signals (i.e., the input exceeds a threshold), it fires and transmits
the signal to the next neuron.
This process is non-linear and adaptive, forming the basis of learning in the brain.
Artificial Neural Network (ANN)
An Artificial Neural Network is a computational system made up of artificial neurons that mimics the
behavior of biological neurons.
Key Characteristics:
Distributed parallel processing system: Each unit (neuron) works simultaneously with others.
Learning from data: Adjusts weights based on input-output examples using a learning algorithm.
Generalization: Capable of making predictions on unseen data after training.
Components of ANN:
Input Layer: Receives raw data.
Hidden Layers: Intermediate layers where computations happen via activation functions.
Output Layer: Produces the final output (e.g., class label, value).
Weights: Each connection has an associated weight that determines the importance of the input.
Bias: Allows shifting of the activation function to better fit the data.
Activation Function: Introduces non-linearity and determines the firing of the neuron.
Model of an Artificial Neuron (Mathematical Formulation)
An artificial neuron is a mathematical function that takes inputs, applies weights, and produces an
output using an activation function.
Structure:
Let the inputs be x1,x2, ..., x_n
SC-mod3 3
Let the corresponding weights be w1,w2, ..., w_n
Let bias be b
Then, the neuron performs the following operations:
1. Weighted Sum:
z = ∑ w i xi + b
i=1
2. Activation:
y= f(z)
Where ff is the activation function (e.g., sigmoid, tanh, ReLU).
Differences Between Biological and Artificial Neurons
Feature Biological Neuron Artificial Neuron
Signal Transmission Electrochemical (via synapse) Numerical (weighted sum)
Learning Hebbian learning Gradient-based (e.g., backpropagation)
Complexity Very high (many interconnections) Simplified abstraction
Adaptability Real-time adaptation, neuroplasticity Requires training epochs
Practical Implications:
The artificial neuron is the core building block of neural networks.
ANN behavior heavily depends on weight adjustments, which occur during training.
The choice of activation function and number of hidden layers significantly affects the
performance of the model.
Learning Methods: Hebbian, Competitive, Boltzmann, etc.
SC-mod3 4
Overview
Learning in neural networks refers to the method of updating the connection weights so that the
network can perform a specific task (e.g., classification or prediction) more accurately over time.
Learning is typically based on data and experience.
There are three broad categories of learning methods:
1. Supervised Learning: Target output is known.
2. Unsupervised Learning: No explicit target; patterns must be discovered.
3. Reinforcement Learning: Learns through feedback in the form of rewards/punishments.
The following learning rules fall into these categories.
Hebbian Learning (Unsupervised)
Proposed by: Donald Hebb (1949).
Principle: “Neurons that fire together, wire together.”
The weight between two neurons increases if both neurons are activated simultaneously.
Weight Update Rule:
Δwij = η ⋅ xi ⋅ yj
Where:
x_i: input from neuron i
y_j: output from neuron j
η: learning rate
Characteristics:
Unsupervised
Simple biologically inspired rule
Limited by uncontrolled weight growth unless normalized
Competitive Learning (Unsupervised)
Mechanism: Neurons compete to become active.
The “winning” neuron (with the strongest response) updates its weights.
Encourages specialization—each neuron becomes tuned to a specific type of input.
Weight Update Rule (for winning neuron jj):
Δwj = η(x − wj )
SC-mod3 5
Applications:
Vector Quantization
Self-Organizing Maps (Kohonen Networks)
Boltzmann Learning (Stochastic / Energy-Based)
Used in Boltzmann Machines – stochastic, generative neural networks.
Involves adjusting weights to minimize a global energy function.
Uses simulated annealing to explore solution space.
Learning Characteristics:
Stochastic, based on probabilities
Weight updates rely on difference between actual and desired joint probabilities
Slow convergence but good for complex distribution modeling
Energy Function (simplified):
E = − ∑ wij si sj − ∑ bi si
i<j i
Where sis_i are neuron states, wijw_{ij} are weights, and bib_i are biases.
Other Learning Methods (Brief Mentions)
Error-Correction Learning: Used in supervised learning like perceptron and backpropagation.
Weights are adjusted to reduce the output error.
Reinforcement Learning: Weights are updated based on a reward signal; no precise error signal is
given.
Backpropagation: A supervised method that propagates error from output layer to hidden layers to
adjust weights using gradient descent.
Comparative Table
Learning Method Type Key Idea Use Case
Hebbian Unsupervised Co-activation strengthens weights Pattern recognition, early models
Competitive Unsupervised Winner-take-all updates Clustering, feature mapping
Boltzmann Stochastic Minimizes energy via annealing Complex pattern learning
Backpropagation Supervised Gradient descent on error Deep learning models
Reinforcement Feedback-based Learns via reward signals Game AI, robotics, control systems
SC-mod3 6
Neural Network Models: Perceptron, Adaline, Madaline; Single-
Layer Networks
Introduction to Single-Layer Neural Networks
A single-layer neural network consists of:
An input layer (just passes inputs to the next layer)
A single layer of output neurons (no hidden layers)
These models are mainly suitable for linearly separable problems. While simple, they form the
foundation for more advanced architectures.
1. Perceptron Model
Proposed by:
Frank Rosenblatt (1958)
Structure:
Inputs x1,x2,…, x_n
Weights w1,w2,..., w_n
Bias bb
Activation Function: Step function
Where:
d: desired output
y: actual output
η: learning rate
Properties:
Works only for linearly separable problems
Binary classifier (0/1)
Fast and simple
SC-mod3 7
Cannot solve XOR-type problems
2. Adaline (Adaptive Linear Neuron)
Proposed by:
Bernard Widrow & Ted Hoff (1960)
Difference from Perceptron:
Uses a linear activation function (no thresholding)
Learning is based on continuous output, not binary
Output:
y = ∑ w i xi + b
Learning Rule:
Minimizes Mean Squared Error (MSE) between actual and target output.
Based on Least Mean Squares (LMS) algorithm
wi (t + 1) = wi (t) + η(d − y)xi
Properties:
Better for regression and continuous-valued outputs
Converges more smoothly due to differentiability
Still limited to linear separability
3. Madaline (Multiple Adaline)
Proposed by:
Widrow & Hoff (1960)
Architecture:
A multi-layer network built from multiple Adalines
Typically has:
Input layer
Hidden layer (comprising Adaline units)
Output layer (also Adaline units)
Learning Rule:
SC-mod3 8
Uses Madaline Rule I and II (heuristic, not gradient-based)
Madaline Rule II adjusts only the weights that lead to incorrect outputs by flipping signs (based on
minimal disturbance principle)
Advantages:
First functional multi-layer neural network
Capable of solving some non-linearly separable problems
Predecessor of modern multilayer networks and backpropagation
Limitations:
Training is complex due to non-differentiable transfer functions
Not as efficient as backpropagation in deep networks
Comparison Table
Feature Perceptron Adaline Madaline
Activation Step function Linear Step function (multi-unit)
Learning Rule Classification error LMS (gradient descent) Heuristic (MR-II)
Output Type Binary (0 or 1) Continuous real values Binary
Network Type Single-layer Single-layer Multi-layer
Problem Type Classification Classification/Regression Non-linear classification
Limitation Linearly separable Linearly separable Training complexity
Applications:
Perceptron: Pattern recognition, classification
Adaline: Signal processing, adaptive filters
Madaline: Early speech recognition, noise reduction
Backpropagation and Multilayer Networks
Introduction to Multilayer Networks
A Multilayer Neural Network (also known as a Multilayer Perceptron – MLP) is composed of:
Input layer (accepts input features)
One or more hidden layers
Output layer
Each neuron in a layer is connected to every neuron in the next layer — forming a fully connected
feedforward architecture.
SC-mod3 9
Why Multilayer?
Single-layer networks (like Perceptron or Adaline) can only solve linearly separable problems.
Multilayer networks can solve non-linearly separable problems (e.g., XOR).
The power of the network comes from its hidden layers, which can model complex patterns.
The Problem of Learning in Multilayer Networks
Unlike single-layer networks, multilayer networks:
Don’t have direct target outputs for hidden layers.
Require an algorithm that can compute the error and propagate it backward to update the hidden
layers.
This leads to the development of the Backpropagation Algorithm.
Backpropagation Algorithm
Objective:
To minimize the error between predicted and actual output using gradient descent by adjusting the
weights in all layers of the network.
Assumptions:
Uses a differentiable activation function (e.g., sigmoid, tanh, ReLU)
Employs supervised learning
Working of Backpropagation (Two Phases)
1. Forward Pass:
Inputs are fed into the network.
Outputs are computed layer by layer using activation functions.
The final output is compared with the actual label to compute error (using Mean Squared Error
or Cross Entropy).
2. Backward Pass:
The error is propagated backward from output to input layers.
Gradients are calculated using chain rule of calculus.
Weights are updated to minimize the error:
∂E
wij (t + 1) = wij (t) − η
∂ wij
Where η: is the learning rate, and E is the loss function.
SC-mod3 10
Key Components
Activation Functions:
Sigmoid: Smooth curve, output between 0 and 1
Tanh: Zero-centered, output between –1 and 1
ReLU: Faster convergence, avoids vanishing gradient
Loss Functions:
MSE: Used for regression tasks
Cross-Entropy: Commonly used for classification tasks
Learning Rate:
Controls how large a step is taken while updating weights.
Too high: might overshoot optimal solution.
Too low: convergence becomes very slow.
Advantages of Backpropagation:
Efficient method for training deep neural networks.
Can approximate any continuous function given enough hidden neurons (Universal Approximation
Theorem).
Automates learning across multiple layers.
Limitations:
Can get stuck in local minima (non-convex optimization).
Suffers from the vanishing gradient problem in deep networks (especially with sigmoid/tanh).
Requires large datasets and computational resources.
Applications:
Image and speech recognition
Natural language processing
Financial prediction
Medical diagnostics
Competitive Learning Networks: Kohonen Self-Organizing
Networks, Hebbian Learning
SC-mod3 11
Competitive Learning: Core Idea
In competitive learning, neurons in a network compete to respond to a given input. Only one neuron
wins (or a small subset wins), and only that neuron’s weights are updated. This is in contrast to
backpropagation, where all weights are updated.
Used primarily in unsupervised learning
Neurons learn to recognize clusters or patterns in the input space
Often applied in clustering, dimensionality reduction, and vector quantization
Kohonen Self-Organizing Map (SOM)
Developed by:
Teuvo Kohonen (1982)
Objective:
To produce a topological mapping of input data — i.e., to convert high-dimensional data into a 2D
representation while preserving the relative structure of the data.
Structure:
Consists of:
Input layer: Feature vector
Output layer: Usually a 2D grid of neurons (map)
Each neuron in the output layer is associated with a weight vector of the same dimension as the
input
Learning Process:
1. Initialization: Randomly initialize weight vectors.
2. Input selection: Choose an input vector xx.
3. Winner selection: Find the Best Matching Unit (BMU) or winning neuron whose weight vector ww
is closest to xx:
BMU = arg min ∥x − wj ∥
4. Weight update:
wj (t + 1) = wj (t) + η(t) ⋅ hj ,i (t) ⋅ (x − wj (t))
η(t): learning rate
hj,i(t): neighborhood function (decays with distance from BMU)
5. Repeat for several epochs, reducing learning rate and neighborhood size over time
SC-mod3 12
Key Properties:
Performs dimensionality reduction
Learns topological relationships
Ideal for visualizing high-dimensional data
Applications:
Pattern recognition
Clustering
Feature compression
Data visualization (e.g., document classification, genomic data)
Hebbian Learning (Revisited)
Although already introduced earlier, in the context of competitive learning networks, Hebbian learning
often forms the foundation.
Core Principle:
"When neuron A repeatedly assists in firing neuron B, the connection between
them strengthens."
Characteristics in Competitive Setting:
Reinforces weight changes in the direction of input patterns
Often combined with normalization to avoid unbounded weight growth
Encourages formation of feature detectors
Comparison: Kohonen SOM vs Hebbian Learning
Feature Kohonen SOM Hebbian Learning
Type Competitive, unsupervised Correlation-based, unsupervised
Weight Update Only BMU and neighbors All neurons involved
Output Topological map Not topological
Use Case Dimensionality reduction, clustering Pattern association, correlation
Biological Plausibility Moderate High
Key Takeaways:
Competitive learning is useful for unsupervised pattern discovery.
Kohonen SOM organizes input into spatially meaningful output maps — particularly useful in high-
dimensional datasets.
SC-mod3 13
Hebbian learning is the biological inspiration that underlies many unsupervised weight adjustment
mechanisms.
Hopfield Networks and Neuro-Fuzzy Modelling
Hopfield Networks
Developed by:
John J. Hopfield (1982)
Overview:
A Hopfield Network is a type of recurrent neural network where:
Each neuron is connected to every other neuron
Connections are symmetric: wij = wji, and wii = 0
The network acts as an associative memory system that stores patterns and retrieves them even
from noisy inputs
Key Characteristics:
Binary threshold units (output: 0 or 1 / –1 or +1 depending on version)
Works in discrete time with asynchronous updates
Designed to converge to a stable state (local minima of an energy function)
Energy Function:
Hopfield defined an energy function that decreases over time:
1
E = −
∑ wij si sj + ∑ θi si
2
i=j
i
s_i: state of neuron ii
wij: weight between neurons ii and jj
θi: threshold of neuron ii
The network evolves until it reaches a stable minimum energy state, which corresponds to a stored
pattern.
Learning Rule (Hebbian-based):
p
(k) (k)
wij = ∑ x x (i
= j)
i j
k=1
Where x^(k) is the k-th pattern to be stored.
SC-mod3 14
Applications:
Associative memory
Error correction
Pattern completion (e.g., image restoration)
Limitations:
Limited capacity: can reliably store only about 0.15n patterns for n neurons
May converge to spurious (unstored) states
Only suitable for static pattern recall, not sequential data
Neuro-Fuzzy Modelling
Definition:
Neuro-Fuzzy systems combine:
Neural networks (learning from data)
Fuzzy logic (handling imprecision and uncertainty)
The goal is to build adaptive systems that learn fuzzy rules from data and tune membership functions
automatically.
Popular Model: Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS is a hybrid model that:
Uses a fuzzy inference system (FIS) (typically Sugeno-type)
Learns its parameters using neural network learning methods (e.g., backpropagation and/or least
squares)
ANFIS Architecture (5 Layers)
1. Input Layer:
Passes crisp inputs to the next layer.
2. Fuzzification Layer:
Applies membership functions to inputs (e.g., Gaussian, triangular).
Outputs degree of membership.
3. Rule Layer:
Each node represents a fuzzy rule.
Outputs the firing strength of a rule.
4. Normalization Layer:
SC-mod3 15
Normalizes firing strengths of rules.
5. Output Layer:
Computes the final output as a weighted average of rule outputs.
Example: Rule Structure in ANFIS
Fuzzy rule:
IF Temperature is Hot AND Humidity is Low THEN FanSpeed = 0.8
Premise part is fuzzy (with membership functions)
Consequent is typically a linear function of inputs (in Sugeno FIS)
Learning in ANFIS:
Uses hybrid learning:
Forward pass: Least Squares estimates for output parameters.
Backward pass: Gradient descent updates for membership function parameters.
Applications:
Forecasting (e.g., stock market, weather)
Control systems (e.g., robotics, HVAC)
Pattern classification
Function approximation
Comparison: Hopfield vs Neuro-Fuzzy
Aspect Hopfield Network Neuro-Fuzzy System
Type Recurrent, associative memory Hybrid (NN + fuzzy logic)
Learning Hebbian rule Backpropagation + fuzzy tuning
Output Binary or bipolar Crisp or fuzzy values
Suitable for Pattern recall Modeling imprecise, nonlinear systems
Interpretability Low High (rule-based reasoning)
SC-mod3 16