100% found this document useful (1 vote)

103 views78 pages

03 NeuralNetworksI PDF

The document discusses neural networks and machine learning concepts such as perceptrons, neurons, learning rules, and decision boundaries. It provides an overview of the McCulloch-Pitts neuron model and how perceptrons use weighted inputs and activation functions to determine whether to fire. The perceptron learning rule is introduced for updating synaptic weights based on the error between the actual and target outputs.

Uploaded by

Ajay Kalal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

103 views78 pages

03 NeuralNetworksI PDF

Uploaded by

Ajay Kalal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

03 NEURAL NETWORKS I

Spring 2020 CS791/CS159 Machine Learning

Credits
1. B1: Machine learning: an algorithmic perspective. 2nd Edition, Marsland, Stephen. CRC press,
2015
2. B2: Principles of Soft Computing. 3rd Edition. S. N. Sivanandam, S. N. Deepa. Wiley,
2018.
3. www.d.umn.edu/~alam0026/NeuralNetwork.ppt
4. www.ohio.edu/people/starzykj/network/Class/ee690/.../NeuralNets%20overview.ppt
5. https://www.staff.ncl.ac.uk/peter.andras/annintro.ppt
6. https://tmohammed.files.wordpress.com/2012/03/w1-01-introtonn.ppt
7. http://aass.oru.se/~lilien/ml/seminars/2007_02_01b-Janecek-Perceptron.pdf
8. http://www.cems.uvm.edu/~rsnapp/teaching/cs295ml/notes/perceptron.pdf
9. http://www.atmos.washington.edu/~dennis/MatrixCalculus.pdf
10. https://en.wikipedia.org/wiki/Matrix_calculus
11. https://data-flair.training/blogs/learning-rules-in-neural-network/
B2:
Principles of Soft Computing.
3rd Edition
S. N. Sivanandam, S. N. Deepa.
Wiley, 2018.
Assignment
Read:
B1: Chapter 3.
B2: Chapter 2, 3.

Problems:
B1: 3.1, 3.2, 3.3
B2: Chapter 2, 3: Solved Problems
Neural Networks
 Inspired by how human brain does analysis
 Neuron – processing unit of human brain
 Neuron collects signals from others through a host of fine structures called
dendrites.
 Neuron sends out spikes of electrical activity through a long, thin stand
known as an axon, which splits into thousands of branches.
 At the end of each branch, a structure called a synapse converts the
activity from the axon into electrical effects that inhibit or excite activity in
the connected neurons.
 Estimated 1011 neurons are present in a human brain.
 Each neuron is connected to thousands of other neurons.
14
 About 10 synapses exist in a human brain.
 Input signals collected through dendrites affect the electrical potential
inside the neuron body – called membrane potential.
 Spiking of neuron happens when this membrane potential crosses a certain
threshold value.
 After firing, the neuron must wait for some time to recover its energy (the
refractory period) before it can fire again.
 Each neuron can be seen as a separate processor doing simple task:
whether to fire or not to fire.
 Brain is a massively parallel supercomputer with 1011 processing elements
and dense interconnection.
 Learning in brain happens on the principal concept of plasticity:
 Modifying the strength of synaptic connections between neurons, and creating
new connections.
McCulloch and Pitts Neuron Model

 Set of weighted inputs 𝒙𝒊 , 𝒘𝒊 that correspond to the synapses

 Adder that sums the input signals (equivalent to the membrane of the cell
that collects electrical charge)
 Activation function (initially a threshold function) that decides whether
the neuron fires (‘spikes’) for the current inputs
Analogy
 𝑥𝑖 = 1 if the connected input neuron fired, =0 if it did not, an

intermediate value (e.g., =.5) can be taken as something in between.

 𝑤𝑖 denotes the strength of synaptic connection

 Input signal is proportional to strength of synaptic weight, so we do

𝑚

ℎ= 𝑤𝑖 𝑥𝑖
𝑖=1
Analogy
 𝜃 is the threshold (“membrane threshold”)

 A simple model, which has limitations

 Incapable of emulating all the behaviors of real biological neurons

 A network of such neurons (Neural Network) can model whatever a computer
can do
 Neurons will be updated sequentially (based on a clock)
 Weights can be positive (excitatory connections) or negative (inhibitory
connections)
 Inputs can also be negative or positive
How does Neuron learn?

 Inputs cannot change

 Only weights and threshold function can change

 Learning through neural network:

How to change weights and threshold functions of the neurons so that the
neural network gives correct output
The Perceptron

McCulloch and Pitts

Neuron

Weighted Connections
The Perceptron

Adder not explicitly shown

 There can be m inputs and n outputs
 𝑚 ≠ 𝑛 or 𝑚 = 𝑛
 𝑤𝑖𝑗 represents weight given to signal value from 𝑖 𝑡ℎ input to 𝑗𝑡ℎ neuron
1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛
Learning Rules in Neural Networks
 Perceptron learning rule
 Hebbian Learning Rule
 Delta learning rule or Widrow-Hoff rule
Perceptron Learning Rule
 Supervised Learning Approach
 The modification in sympatric weight of a node is equal to the
multiplication of error and the input.
𝑤𝑖𝑗 ← 𝑤𝑖𝑗 + 𝜂 𝑡𝑗 − 𝑦𝑗 ∙ 𝑥𝑖
or, 𝑤𝑖𝑗 ← 𝑤𝑖𝑗 − 𝜂 𝑦𝑗 − 𝑡𝑗 ∙ 𝑥𝑖
where
𝑦𝑗 : actual output at 𝑗𝑡ℎ neuron
𝑡𝑗 : target output corresponding to 𝑗𝑡ℎ neuron
𝜂: learning rate
 Input 𝑥𝑖 , target 𝑡𝑗 and adder output 𝑦𝑗 are beyond our control
 𝑤𝑖𝑗 and 𝜂 is what we can change
 High value of 𝜂: learning will be too fast (dramatic) and the system will
never stabilize
 Low value of 𝜂: learning will be too slow – system will have to see the input
too many times before it learns, but it will be resistant to noise.
 Ideally 0.1 < 𝜂 < 0.4
Bias Input
 What if all inputs are zero and we want one or more neurons to fire?
 Solution:
 Introduce a non-zero (say -1) “bias” input indexed at 0
𝑡ℎ
 Introduce weights 𝑤0𝑖 : weight of bias input to 𝑖 neuron.
Perceptron Algorithm
 Algorithmic complexity?
 Ο(𝑇𝑚𝑛𝑘)
 T: #iterations
 m: #inputs
 n: #outputs
 k: #samples
Simulating OR output
Bias

Inputs

Take 𝑤𝑜 = −0.05, 𝑤1 = −0.02, 𝑤2 = 0.02, 𝜂 = 0.25

Let us iterate
1. B1: Machine learning: an algorithmic perspective. 2nd Edition,
Marsland, Stephen. CRC press, 2015
2. B2: Principles of Soft Computing. 3rd Edition. S. N. Sivanandam, S. N.
Deepa. Wiley, 2018.
B1 vs. B2:
 B1: 𝑤𝑛𝑒𝑤 ← 𝑤𝑜𝑙𝑑 − 𝜂 𝑦 − 𝑡 𝑥

 Assuming binary data using Perceptron Rule

 B2: 𝑤𝑛𝑒𝑤 ← 𝑤𝑜𝑙𝑑 + 𝛼𝑡𝑥
 Assuming bipolar data using Perceptron Rule
Hebb’s Rule
 Donald Hebb in 1949
 Changes in the strength of synaptic connections are proportional to the
correlation in the firing of the two connecting neurons.
 If two neighbor neurons activated and deactivated at the same time. Then the
weight connecting these neurons should increase.
 For neurons operating in the opposite phase, the weight between them should
decrease.
 If there is no signal correlation, the weight should not change/ the connection
should die away
Δ𝑤𝑖𝑗 ← 𝑦𝑗 × 𝑥𝑖 , ; 𝑥𝑖 , 𝑦𝑗 ∈ {−1,1}
Generally, activation function is the linear identity function, 𝑡𝑗 = 𝑓(𝑦𝑗 ) = 𝑦𝑗
 At the start, values of all weights are set to zero
 Unsupervised learning rule
 Target values are not used
Delta learning rule
 Similar to perceptron rule, but
 Based on minimization or LMS (Least Mean Square) error using Gradient
Descent Technique
 Works for differentiable activation functions (e.g., linear) vs. the step
function in perceptron rule
 Perceptron rule is guaranteed to converge if the data is linearly separable,
but the gradient-descent approach continues forever, converging only
asymptotically to the solution (tries to minimize error in case of inseparable
data).
 We will stick to perceptron rule for now on, will discuss Gradient
Descent later
𝑤 is an (m: #dimensions of input, n: #neurons or #dimensions of output)
matrix
𝑦 and 𝑡 each is an (1, n: #neurons) matrix
x is an (1, m: #dimensions of input) matrix
𝑤 ← 𝑤 − 𝜂𝑥 𝑇 𝑦 − 𝑡
1 1
𝑇
𝑤𝑖𝑗 ← 𝑤𝑖𝑗 − 𝜂 𝑥𝑖𝑘 𝑦𝑘𝑗 − 𝑡𝑘𝑗 = 𝑤𝑖𝑗 − 𝜂 𝑥𝑘𝑖 (𝑦𝑘𝑗 − 𝑡𝑘𝑗 )
𝑘=1 𝑘=1

Value of 𝑖 𝑡ℎ (dimension of) input

Difference (predicted – target) value of 𝑗𝑡ℎ neuron

Batch Mode Learning
 Let input dataset have 𝑠 samples, each sample have 𝑚 inputs (i.e., 𝑚
dimensions) and let there be 𝑛 neurons
 Input dataset 𝑥 is an (s, m: #dimensions of input) matrix

 𝑦 and 𝑡 each is an (s, n: #neurons) matrix

 𝑤 is an (m: #dimensions of input, n: #neurons) matrix

Algorithm
 For 𝑃 iterations do:

𝑦 for all 𝑠 input samples

 Predict
 Update 𝑤 for combined effect of all 𝑠 input sample, i.e.,
𝑠 𝑠
𝑇
𝑤𝑖𝑗 ← 𝑤𝑖𝑗 − 𝜂 𝑥𝑘𝑖 𝑦𝑘𝑗 − 𝑡𝑘𝑗 = 𝑤𝑖𝑗 − 𝜂 𝑥𝑖𝑘 𝑦𝑘𝑗 − 𝑡𝑘𝑗
𝑘=1 𝑘=1
𝑤 ← 𝑤 − 𝜂𝑥 𝑇 𝑦 − 𝑡
 Batch mode seems to be often better (than updating at every sample)
Bias

Inputs

Take 𝑤𝑜 = −0.05, 𝑤1 = −0.02, 𝑤2 = 0.02, 𝜂 = 0.25

Let us iterate in batch mode.
Decision Boundary for OR function

Perceptron tries to find a straight line (in 2D, a

plane in 3D, and a hyperplane in higher
dimensions) – called decision boundary.
 What is the decision boundary and how is it a line? (for 2D case)

Activation Value (say)

𝑚

𝑤𝑖𝑗 𝑥𝑖 = 𝑥 ⋅ 𝑤𝑗
𝑖=0
Where 𝑤𝑗 is the column vector corresponding to 𝑗𝑡ℎ neuron.
𝑗𝑡ℎ neuron fires if 𝑥 ⋅ 𝑤𝑗 > 0 and does not fire otherwise
So, 𝑗𝑡ℎ neuron acts as a two-class classifier:
Class I: 𝑥 ⋅ 𝑤𝑗 > 0
Class II: 𝑥 ⋅ 𝑤𝑗 ≤ 0
𝑥 ⋅ 𝑤𝑗 = 0 can be considered as the decision boundary for 𝑗𝑡ℎ neuron
For 2-D OR case with 1 neuron, this becomes:
𝑥0 𝑤0 + 𝑥1 𝑤1 + 𝑥2 𝑤2 = 0
−𝑤0 + 𝑥1 𝑤1 + 𝑥2 𝑤2 = 0 (−𝑤0 corresponds to bias)
The above is the equation for a straight line.
−𝑤0 + 𝑥1 𝑤1 + 𝑥2 𝑤2 = 0

𝑤0
𝑤12 + 𝑤22
Another Perspective
 Let 𝑥 1 = {−1, 𝑥11 , 𝑥21 } and 𝑥 2 = {−1, 𝑥12 , 𝑥22 } be two points on the
decision boundary.
𝑥 1 ⋅ 𝑤𝑗 = 0 and 𝑥 2 ⋅ 𝑤𝑗 = 0 , i.e.,
𝑥 1 − 𝑥 2 ⋅ 𝑤𝑗 = 0
That is, vector 𝑤𝑗 is perpendicular to the line 𝑥 1 − 𝑥 2 , and this holds
for any two points 𝑥 1 and 𝑥 2 on the decision boundary.
Hence, decision boundary is a line and 𝑤𝑗 is a vector perpendicular to it.
Decision boundary is a line in 2D case, plane in 3D case and hyperplane
in higher dimensions.
𝑤𝑗
Convergence Theorem
 If the data is linearly separable, the fixed-increment perceptron
algorithm terminates after a finite number of weight updates.
 Proof taken from the slides by Prof. Robert Snapp, Department of
Computer Science, University of Vermont, Vermont, USA as part of his
course CS 295: Machine Learning
Proof of Convergence Theorem
 Consider a single neuron.
 Let 𝑤 represent the weight vector.
 Let 𝑥𝑖 represent the 𝑖 𝑡ℎ sample vector
 Let 𝑡𝑖 represent the target label of the 𝑖 𝑡ℎ sample.
 𝑡𝑖 ∈ {0,1}
 Let the activation function be:
1 if 𝑥𝑖 𝑤 > 0
𝑦𝑖 =
0 if 𝑥𝑖 𝑤 ≤ 0
 Update rule
𝑤 = 𝑤 − 𝜂𝑥𝑖𝑇 (𝑦𝑖 − 𝑡𝑖 )
 Let
−1 if 𝑡𝑖 = 0
𝑙𝑖 =
+1 if 𝑡𝑖 = 1
 Then, update rule becomes
𝑤 = 𝑤 − 𝜂𝑥𝑖𝑇 (𝑦𝑖 − 𝑡𝑖 )
⇒ 𝑤 += 𝜂𝑥𝑖𝑇 𝑙𝑖
 Let 𝑤 ∗ represent the solution that separates the given data.
 Let 𝑥𝑖 = 𝑥𝑖 𝑙𝑖
 Then,
𝑥𝑖 𝑤 ∗ > 0, ∀𝑖
 And, weight update becomes
𝑤 += 𝜂 𝑥𝑖𝑇
 Let 𝑥𝑖 = 𝑥𝑖 𝑙𝑖
 Let 𝑤 𝑘 represent the weight vector after the 𝑘𝑡ℎ update.
 Let𝑥 𝑘 represent the input sample that triggered the 𝑘𝑡ℎ update.
 Thus,
𝑤 1 = 𝑤 0 + 𝜂 𝑥 𝑇 (1)
𝑤 2 = 𝑤 1 + 𝜂 𝑥 𝑇 (2)
⋮
𝑤 𝑘 = 𝑤 𝑘 − 1 + 𝜂 𝑥 𝑇 (𝑘)
 We shall prove
𝐴𝑘 2 ≤ 𝑤 𝑘 − 𝑤 0 2
≤ 𝐵𝑘
for constants A and B
Thus, the network must converge after no more than 𝑘max = 𝐵/𝐴
updates
Cauchy-Schwartz Inequality
Let 𝑎, 𝑏 ∈ ℝ𝑛
𝑎 2 𝑏 2 ≥ 𝑎𝑇 𝑏 2
𝑤 1 = 𝑤 0 + 𝜂 𝑥 𝑇 (1)
𝑤 2 = 𝑤 1 + 𝜂 𝑥 𝑇 (2)
⋮
𝑤 𝑘 = 𝑤 𝑘 − 1 + 𝜂 𝑥 𝑇 (𝑘)
Adding the above 𝑘 equations yields
𝑤 𝑘 = 𝑤 0 + 𝜂(𝑥 𝑇 1 + 𝑥 𝑇 2 + ⋯ + 𝑥 𝑇 𝑘 )
𝑤 𝑘 − 𝑤 0 = 𝜂(𝑥 𝑇 1 + 𝑥 𝑇 2 + ⋯ + 𝑥 𝑇 𝑘 )
∗𝑇
Multiplying both sides with the solution 𝑤
∗𝑇 ∗𝑇 𝑇
𝑤 𝑤 𝑘 − 𝑤 0 = 𝜂𝑤 (𝑥 1 + 𝑥 𝑇 2 + ⋯ + 𝑥 𝑇 𝑘 )
Let
𝑎 = min 𝑤 ∗ 𝑇 𝑥 𝑇 > 0
𝑥
Thus,
𝑤 ∗𝑇 𝑤 𝑘 − 𝑤 0 ≥ 𝜂𝑎𝑘 > 0
∗𝑇
𝑤 𝑤 𝑘 −𝑤 0 ≥ 𝜂𝑎𝑘 > 0
Squaring both sides, with the Cauchy-Schwartz inequality, yields
∗𝑇 2 2 ∗𝑇 2 2
𝑤 𝑤 𝑘 −𝑤 0 ≥ 𝑤 𝑤 𝑘 −𝑤 0 ≥ 𝜂𝑎𝑘
Thus,
2
2
𝜂𝑎
𝑤 𝑘 −𝑤 0 ≥ 𝑘2
𝑤 ∗𝑇
This gives the lower bound.
Proof: Upper Bound
𝑤 1 = 𝑤 0 + 𝜂 𝑥 𝑇 (1)
𝑤 2 = 𝑤 1 + 𝜂 𝑥 𝑇 (2)
⋮
𝑤 𝑘 = 𝑤 𝑘 − 1 + 𝜂 𝑥 𝑇 (𝑘)
 Subtracting 𝑤 0 from both sides yields
𝑤 1 − 𝑤 0 = 𝜂 𝑥 𝑇 (1)
𝑤 2 − 𝑤 0 = 𝑤 1 − 𝑤 0 + 𝜂 𝑥 𝑇 (2)
⋮
𝑤 𝑘 − 𝑤 0 = 𝑤 𝑘 − 1 − 𝑤 0 + 𝜂 𝑥 𝑇 (𝑘)
𝑤 1 − 𝑤 0 = 𝜂 𝑥 𝑇 (1)
𝑤 2 − 𝑤 0 = 𝑤 1 − 𝑤 0 + 𝜂 𝑥 𝑇 (2)
⋮
𝑤 𝑘 − 𝑤 0 = 𝑤 𝑘 − 1 − 𝑤 0 + 𝜂 𝑥 𝑇 (𝑘)
Squaring both sides yields
2
𝑤 1 −𝑤 0 = 𝜂2 𝑥 𝑇 1 2

2 2 𝑇 𝑇
𝑤 2 −𝑤 0 = 𝑤 1 − 𝑤 0 + 2𝜂 𝑤 1 − 𝑤 0 𝑥 (2)
+𝜂2 𝑥 𝑇 2 2
⋮
2 2 𝑇 𝑇
𝑤 𝑘 −𝑤 0 = 𝑤 𝑘 − 1 − 𝑤 0 + 2𝜂 𝑤 𝑘 − 1 − 𝑤 0 𝑥 (𝑘)
+𝜂2 𝑥 𝑇 𝑘 2
2
𝑤 1 −𝑤 0 = 𝜂2 𝑥 𝑇 1 2

𝑤 2 −𝑤 0 2
2 𝑇 𝑇
= 𝑤 1 −𝑤 0 + 2𝜂 𝑤 1 − 𝑤 0 𝑥 2 + 𝜂2 𝑥 𝑇 2 2

⋮
𝑤 𝑘 −𝑤 0 2
2 𝑇 𝑇
= 𝑤 𝑘−1 −𝑤 0 + 2𝜂 𝑤 𝑘 − 1 − 𝑤 0 𝑥 𝑘 + 𝜂2 𝑥 𝑇 𝑘 2

Since, 𝑥 𝑇 1 triggers an update, it must have been misclassified by weight

vector 𝑤 0 , i.e., 𝑤 0 𝑇 𝑥 𝑇 1 < 0
Similarly,
𝑤 𝑗 − 1 𝑇 𝑥 𝑇 𝑗 < 0, for 𝑗 = 1,2, … 𝑘
2
𝑤 1 −𝑤 0 = 𝜂2 𝑥 𝑇 1 2
2 2
𝑤 2 −𝑤 0 ≤ 𝑤 1 −𝑤 0 − 2𝜂𝑤 0 𝑇 𝑥 𝑇 (2) + 𝜂2 𝑥 𝑇 2 2

⋮
2 2
𝑤 𝑘 −𝑤 0 ≤ 𝑤 𝑘−1 −𝑤 0 − 2𝜂𝑤 0 𝑇 𝑥 𝑇 (𝑘) + 𝜂2 𝑥 𝑇 𝑘 2

Summing the 𝑘 inequalities yields,

𝑤 𝑘 −𝑤 0 2
≤ 𝜂2 𝑥 𝑇 1 2 + 𝑥 𝑇 2 2 + ⋯ + 𝑥 𝑇 𝑘 2

− 2𝜂𝑤 0 𝑇 𝑥 𝑇 2 + ⋯ + 𝑥 𝑇 (𝑘)
𝑤 𝑘 −𝑤 0 2
≤ 𝜂2 𝑥 𝑇 1 2 + 𝑥 𝑇 2 2 + ⋯ + 𝑥 𝑇 𝑘 2

− 2𝜂𝑤 0 𝑇 𝑥 𝑇 2 + ⋯ + 𝑥 𝑇 (𝑘)

Define
𝑇 2
𝑀 = max
𝑇
𝑥
𝑥
𝑇 𝑇
𝜇 = 2 min
𝑇
𝑤 0 𝑥 < 0 (misclassfications)
𝑥
The top equation becomes
2
𝑤 𝑘 −𝑤 0 ≤ 𝜂2 𝑀 − 𝜂𝜇 𝑘
 Hence, we have shown
𝐴𝑘 2 ≤ 𝑤 𝑘 − 𝑤 0 2 ≤ 𝐵𝑘
2
𝜂𝑎 2
𝐴= 𝑇 and 𝐵 = 𝜂 𝑀 − 𝜂𝜇
𝑤∗
Thus,
𝜂𝑀 − 𝜇 ∗ 2
𝑘𝑚𝑎𝑥 = 2
𝑤
𝜂𝑎
LINEAR SEPARABILITY
 A straight line decision boundary may not always exist
 Linearly separable cases – when a straight (linear) decision boundary
is possible
Multiple Neurons May Help!
XOR Function – Linearly Inseparable
XOR – separable in 3D

Added Dimension
 It is always possible to separate out two classes with a linear function,
provided that you project the data into the correct set of dimensions.
 Kernel classifiers – basis of Support Vector Machines
Data Normalization/Standardization
 Scaling input data to lie between (-1,+1)
 Additionally with zero mean and unit variance – little better as it does not
allow outliers to dominate as much
𝑥 = (𝑥 − 𝜇)/𝜎
 Partitioning data based on range to integral values
 Choosing a subset of features can improve accuracy
LINEAR REGRESSION
 Classification: find a line that separates out the classes
 Regression: fit a line to data
 Classification as instance of Regression
1. Fit a line to target data
2. Do regression for each class separately, i.e., fit line for data points of
each classes separately
 In regression, we are computing lines (in 2D) that can predict target
values closely, i.e., 𝑦 = 𝛽1 𝑥 + 𝛽0
 General form:
𝑀

𝑦= 𝛽𝑖 𝑥𝑖
𝑖=0
where: 𝑀 is the #of dimension of an input vector
𝛽 = (𝛽0 , 𝛽1 … , 𝛽𝑀 ) defines a line in 2-D, plane in 3-D and hyperplane
in higher dimensions.
Linear regression in two and three dimensions
 How do we define the line/plane/hyperplane that best fits the data?
 Minimize the distance between the line and the data points.
 Least-squares Optimization

Where,
N: #data points
M: #dimension of input vector
N: #data points
M: #dimension of input vector

In matrix form, the above can be written as

𝑡 − 𝑋𝛽 𝑇 (𝑡 − 𝑋𝛽)
Where,
𝑡 is an (𝑁 × 1) vector containing target values
𝑋 is an (𝑁 × 𝑀) matrix denoting input values (including bias)
𝑋𝑖𝑗 : denotes value of 𝑗𝑡ℎ dimension of 𝑖 𝑡ℎ input vector
𝛽 is an (𝑀 × 1) vector defining the hyperplane.
To minimize least-squares error:
𝑑( 𝑡 − 𝑋𝛽 𝑇 𝑡 − 𝑋𝛽 )
=0
𝑑𝛽
𝑑( 𝑡 𝑇 − 𝑋𝛽 𝑇 𝑡 − 𝑋𝛽 )
=0
𝑑𝛽
−𝑑(𝑡 𝑇 t) 𝑑 𝑡 𝑇 𝑋𝛽 𝑑 𝑋𝛽 𝑇 𝑡 𝑑 𝛽 𝑇 𝑋 𝑇 𝑋𝛽
− − + =0
𝑑𝛽 𝑑𝛽 𝑑𝛽 𝑑𝛽
0 − 𝑡 𝑇 𝑋 − 𝑡 𝑇 𝑋 + 𝛽 𝑇 𝑋 𝑇 𝑋 + 𝑋 𝑇 𝑋 T = −2𝑡 𝑇 𝑋 + 2𝛽 𝑇 𝑋 𝑇 𝑋 = 0
𝛽𝑇 𝑋 𝑇 − 𝑡 𝑇 𝑋 = 0
𝑋 𝑇 𝑋Β − 𝑡 = 0
Hence, 𝛽 = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑡 [assuming 𝑋 𝑇 𝑋 −1 exists]
 The following links may be helpful in finding matrix calculus identities
used in the previous proof:
 https://en.wikipedia.org/wiki/Matrix_calculus

 https://en.wikipedia.org/wiki/Matrix_calculus#Vector-by-vector

 http://www.math.nyu.edu/~neylon/linalgfall04/project1/dj/proptranspose
.htm
 Fill in the details in the proof (left as homework assignment)
Linear Regression for AND, OR and XOR
Inputs AND data OR data XOR data
[[0,0] [[-0.25] [[ 0.25] [[ 0.5]
[ 0,1] [ 0.25] [ 0.75] [ 0.5]
[ 1,0] [ 0.25] [ 0.75] [ 0.5]
[ 1,1]] [ 0.75]] [ 1.25]] [ 0.5]]
Miscellaneous Topics
Adaline: Adaptive Linear Neuron
A single linear unit that uses input to activation function (activation potential)
for calculating error, rather than the output of the activation function
Update Rule
𝑤𝑖 ← 𝑤𝑖 − 𝜂 𝑦𝑖𝑛 − 𝑡 ∙ 𝑥𝑖
Madaline: Multiple adaptive linear neurons
 Many Adalines in parallel with a single output unit
 Output is based on selection rule (e.g., max, AND)

𝑣𝑖 s are fixed, +ve

and possess a
common value
 Training is like Adaline
1. Let 𝑧𝑗 = 𝑓(𝑧in𝑗 ) denote the output of 𝑗 th Adaline unit
2. If the final output does not match with target
𝑤𝑖𝑗 ← 𝑤𝑖𝑗 − 𝜂 𝑧𝑗 − 𝑡 ∙ 𝑥𝑖
ANNs Based on Connections
 Single-layer feed-forward network
 Multilayer feed-forward network
 Single node with its own feedback
 Single-layer recurrent network
 Multilayer recurrent network.
Single Layer Feed-Forward Network
Multi Layer Feed-Forward Network

 It may or may not be fully connected

Single Node with Own Feedback

 Lateral Feedback: feedback to the same layer

 Recurrent Networks: feedback networks with closed loop
Single Layer Recurrent Neural Network
Multi Layer Recurrent Neural Network

Business Writing Scenarios
75% (4)
Business Writing Scenarios
397 pages
Cart Sauer Danfoss
100% (5)
Cart Sauer Danfoss
8 pages
Ikigai The Japanese Secret To PDF
0% (1)
Ikigai The Japanese Secret To PDF
1 page
03 NeuralNetworksI
No ratings yet
03 NeuralNetworksI
93 pages
Unit 1 Notes Final
No ratings yet
Unit 1 Notes Final
36 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
54 pages
ML Lec11
No ratings yet
ML Lec11
14 pages
Notes of UNIT-1
No ratings yet
Notes of UNIT-1
20 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
53 pages
Lect 5
No ratings yet
Lect 5
41 pages
CV 2025 Spring 14
No ratings yet
CV 2025 Spring 14
33 pages
Unit 4
No ratings yet
Unit 4
18 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
UNIT1
No ratings yet
UNIT1
72 pages
Learning Algorithm
No ratings yet
Learning Algorithm
58 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Perceptron Learning in Neural Networks
No ratings yet
Perceptron Learning in Neural Networks
66 pages
Wk. 12. Artificial Neural Networks (12!05!2021)
No ratings yet
Wk. 12. Artificial Neural Networks (12!05!2021)
48 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
07 Neural Networks1
No ratings yet
07 Neural Networks1
73 pages
06 Ann
No ratings yet
06 Ann
56 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
Neural Networks: Machine Learning Is Machine Learning Is
No ratings yet
Neural Networks: Machine Learning Is Machine Learning Is
23 pages
Unit 1 Until MLP
No ratings yet
Unit 1 Until MLP
56 pages
Ch1-Fundamental of Neural Network
No ratings yet
Ch1-Fundamental of Neural Network
59 pages
Neural Networks
No ratings yet
Neural Networks
42 pages
ML Tushar Assignment
No ratings yet
ML Tushar Assignment
8 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
51 pages
Unit-1 Deep Learning
No ratings yet
Unit-1 Deep Learning
20 pages
Unit 1 AML
No ratings yet
Unit 1 AML
19 pages
Lect 5
No ratings yet
Lect 5
26 pages
28 Lecture CSC462
No ratings yet
28 Lecture CSC462
28 pages
Isch 4
No ratings yet
Isch 4
44 pages
Wk9-Neural Networks
No ratings yet
Wk9-Neural Networks
46 pages
Lecture 6 Perceptron Learning Rule
No ratings yet
Lecture 6 Perceptron Learning Rule
32 pages
Neural Network
No ratings yet
Neural Network
82 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
Neural Networks Example
No ratings yet
Neural Networks Example
129 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
DL CHPT 1
No ratings yet
DL CHPT 1
59 pages
Unit II
No ratings yet
Unit II
33 pages
4 History of The Perceptron
No ratings yet
4 History of The Perceptron
34 pages
Neural Networks Two
No ratings yet
Neural Networks Two
69 pages
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
24 pages
Fundamentals of Artificial Neural Networks
No ratings yet
Fundamentals of Artificial Neural Networks
27 pages
Perceptron Learning
No ratings yet
Perceptron Learning
19 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
DL Unit-1 San
No ratings yet
DL Unit-1 San
58 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Experiments
No ratings yet
Experiments
39 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
5.2 Neural Network
No ratings yet
5.2 Neural Network
111 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
39 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
13 NeuralNets
No ratings yet
13 NeuralNets
9 pages
Function Oriented Design - Part2 PDF
No ratings yet
Function Oriented Design - Part2 PDF
71 pages
Requirements Analysis and Specification PDF
No ratings yet
Requirements Analysis and Specification PDF
77 pages
Requirements Analysis and Specification PDF
No ratings yet
Requirements Analysis and Specification PDF
77 pages
Function Oriented Design PDF
No ratings yet
Function Oriented Design PDF
71 pages
Software Project Management PDF
100% (2)
Software Project Management PDF
121 pages
Software Design PDF
No ratings yet
Software Design PDF
100 pages
Algorithm Efficiency Analysis Guide
No ratings yet
Algorithm Efficiency Analysis Guide
2 pages
18eln mergedPDFdocs PDF
100% (1)
18eln mergedPDFdocs PDF
125 pages
Amazon Invoice for Electronics Purchase
100% (1)
Amazon Invoice for Electronics Purchase
1 page
Jawaban MTCNA
No ratings yet
Jawaban MTCNA
13 pages
Sony hcd-gtr6 gtr6b gtr7 gtr8 gtr8b Ver.1.2 PDF
No ratings yet
Sony hcd-gtr6 gtr6b gtr7 gtr8 gtr8b Ver.1.2 PDF
92 pages
Data Acquisition in MATLAB
No ratings yet
Data Acquisition in MATLAB
27 pages
MG5050 Installation Guide
No ratings yet
MG5050 Installation Guide
1 page
Jurnal Pricing Strategy - Khoerul & Fajar
No ratings yet
Jurnal Pricing Strategy - Khoerul & Fajar
20 pages
Design and Fabrication of Compact Bicycle Trolley
No ratings yet
Design and Fabrication of Compact Bicycle Trolley
7 pages
Here Is The Placeholder For Three Lines Title Create Social Media Accounts For Your Business
No ratings yet
Here Is The Placeholder For Three Lines Title Create Social Media Accounts For Your Business
21 pages
E-Guard: Home Security for Cairo
No ratings yet
E-Guard: Home Security for Cairo
23 pages
HFS File Sharing Guide
No ratings yet
HFS File Sharing Guide
53 pages
Operating Manual-Sx60-100 Om 090824
No ratings yet
Operating Manual-Sx60-100 Om 090824
112 pages
Wi-Fi Test Suite Release Notes
No ratings yet
Wi-Fi Test Suite Release Notes
10 pages
RX1 Getting Started
No ratings yet
RX1 Getting Started
60 pages
6670 01 Que 2003 SPECIMEN
No ratings yet
6670 01 Que 2003 SPECIMEN
4 pages
Mandarine Log
No ratings yet
Mandarine Log
37 pages
CV Riswanda Zikrawi
No ratings yet
CV Riswanda Zikrawi
1 page
How To Easily Generate Sales Funnels and Growth Hack Your Business Using ClickFunnels - Kev Chavez - Your Keen & Crisp VP
50% (8)
How To Easily Generate Sales Funnels and Growth Hack Your Business Using ClickFunnels - Kev Chavez - Your Keen & Crisp VP
103 pages
EC8491 CT Notes Full - by WWW - EasyEngineering.net 4 PDF
No ratings yet
EC8491 CT Notes Full - by WWW - EasyEngineering.net 4 PDF
152 pages
Prasana Kumar.S: Educational Qualification
No ratings yet
Prasana Kumar.S: Educational Qualification
2 pages
Pelltech Burners Modbus RTU Guide
No ratings yet
Pelltech Burners Modbus RTU Guide
10 pages
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
27 pages
CH 3-5 MRI Contrast Spatial Localization
No ratings yet
CH 3-5 MRI Contrast Spatial Localization
109 pages
Exploring English Learners Experiences of Using M
No ratings yet
Exploring English Learners Experiences of Using M
15 pages
Module 6 - Spring Boot Java (MCA)
No ratings yet
Module 6 - Spring Boot Java (MCA)
113 pages
Huawei RTN 905e Brochure
No ratings yet
Huawei RTN 905e Brochure
2 pages

03 NeuralNetworksI PDF

Uploaded by

03 NeuralNetworksI PDF

Uploaded by

03 NEURAL NETWORKS I

Spring 2020 CS791/CS159 Machine Learning

 Set of weighted inputs 𝒙𝒊 , 𝒘𝒊 that correspond to the synapses

intermediate value (e.g., =.5) can be taken as something in between.

 Input signal is proportional to strength of synaptic weight, so we do

 A simple model, which has limitations

 Incapable of emulating all the behaviors of real biological neurons

 Inputs cannot change

 Learning through neural network:

McCulloch and Pitts

Adder not explicitly shown

Take 𝑤𝑜 = −0.05, 𝑤1 = −0.02, 𝑤2 = 0.02, 𝜂 = 0.25

 Assuming binary data using Perceptron Rule

Value of 𝑖 𝑡ℎ (dimension of) input

Difference (predicted – target) value of 𝑗𝑡ℎ neuron

 𝑦 and 𝑡 each is an (s, n: #neurons) matrix

 𝑤 is an (m: #dimensions of input, n: #neurons) matrix

𝑦 for all 𝑠 input samples

Take 𝑤𝑜 = −0.05, 𝑤1 = −0.02, 𝑤2 = 0.02, 𝜂 = 0.25

Perceptron tries to find a straight line (in 2D, a

Activation Value (say)

Since, 𝑥 𝑇 1 triggers an update, it must have been misclassified by weight

Summing the 𝑘 inequalities yields,

In matrix form, the above can be written as

𝑣𝑖 s are fixed, +ve

 It may or may not be fully connected

 Lateral Feedback: feedback to the same layer

You might also like