Machine Learning
Lecture # 3
Single & Multilayer Percceptron
1
Artificial Neural Network -
Perceptron
A (Linear) Decision Boundary
Represented by:
One artificial neuron
called a “Perceptron”
Low space complexity
Low time complexity
Input signals sent
from other neurons
If enough sufficient
signals accumulate, the
neuron fires a signal.
Connection strengths determine
how the signals are
accumulated
What is ANN?
• The nucleus sums all these new input values which gives us
the activation
• For n inputs and n weights – weights multiplied by input
and summed
a = x1w1+x2w2+x3w3... +xnwn
5
Perceptron
Perceptron
• input signals ‘x’ and weights ‘w’ are multiplied
• weights correspond to connection strengths
• signals are added up – if they are enough, FIRE!
x1 w1 add
if (a t)
M
x2 output output
output
w2 a xi w i 1 signa
else lsignal
i1
output
x3 w3 0
incoming connection
activation
signal strength
level
Calculation…
M Sum notation
a xi (just like a loop from 1 to
M)
wi
i1
Multiple corresponding
double[] x = elements and add them
up
a (activation)
double[] w =
if (activation > threshold) FIRE !
Perceptron Decision Rule
M
xw
i i t then output 1, else output
if
i 1 0
M
xw
if i i t then ou tp u t 1, els e ou tp u t 0
i 1
output = 0
output = 1
Is this a good decision boundary?
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = 1.0
w2 = 0.2
t = 0.05
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = 2.1
w2 = 0.2
t = 0.05
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = 1.9
w2 = 0.02
t = 0.05
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = -0.8
w2 = 0.03
t = 0.05
Changing the weights/threshold makes the decision boundary
move.
M
x [ 1.0, 0.5, 2.0 ] a xi wi
i1
w [ 0.2, 0.5, 0.5 ] x1 w1
t 1.0 x2
w2
x3 w3
Q1. What is the activation, a, of the
neuron? Q2. Does the neuron fire?
Q3. What if we set threshold at 0.5 and
weight #3 to zero?
M
x [ 1.0, 0.5, 2.0 ] a xi wi
i1
w [ 0.2, 0.5, 0.5 ] x1 w1
w2
t 1.0 x2
x3 w3
Q1. What is the activation, a, of the neuron?
M
a xi wi (1.0 0.2) (0.5 0.5) (2.0 0.5)
1.45
i1
Q2. Does the neuron fire?
if (activation > threshold) output=1 else output=0
M
x [ 1.0, 0.5, 2.0 ] a xi wi
i1
w [ 0.2, 0.5, 0.5 ] x1 w1
w2
t 1.0 x2
x3 w3
Q3. What if we set threshold at 0.5 and weight #3 to zero?
M
a xi wi (1.0 0.2) (0.5 0.5) (2.0 0.0) 0.45
i1
if (activation > threshold) output=1 else
output=0
…. So no, it does not fire..
We can rearrange the decision rule….
M
if x w t then output 1, else output 0
i 1 i i
M
if x w t 0 then output 1, else output
i 1 i i 0
x w (1
M
if t)i 0 then output 1, else output
i1 i
0
if M
0 then output 1, else output
i 1 x w
i i ( x w )
0 0
0
then output 1, else output
if
M
0
i 0 xi wi 0
We now treat the threshold like any other weight with a permanent input of -1
The Bias
False Input
Perceptron Learning
Algorithm
initialise weights (w)
Repeat until all points are correctly classified
Repeat for each point
Calculate margin (yiwXi) for point i)
If margin > 0, point is correclty
classified
Else change the weights to increase margin such that
Δw = ηyiXi and wnew = wold + Δw
end
end
Perceptron convergence theorem:
If the data is linearly separable, then application of the
Perceptron learning rule will find a separating decision boundary,
within a finite number of iterations
Decision Boundary Using
Perceptron
Multiple Outputs
Criticize on Perceptron
• Minsky and Papert criticise the
perceptron (1969)
• Minsky and Papert's criticism was partly right
• We have no a priori reason to think
that problems should be linear separable
•
it turns out world is full
However,
the that of close linear
problems
are a to
least
separable
Can a Perceptron solve this problem?
Can a Perceptron solve this problem? ….. NO.
Perceptrons only solve
LINEARLY SEPARABLE
problems
With a perceptron…
the decision boundary is
LINEAR
A
0
0 1
B
Overview of SVM w.r.t.
Perceptron
Perceptron
Perceptron VS SVM
Perceptron VS SVM
• The Perceptron does not try to optimize the separation
"distance". As long as it finds a hyperplane that separates the
two sets, it is good. SVM on the other hand tries to maximize
the "support vector", i.e., the distance between two closest
opposite sample points.
• The SVM typically tries to use a "kernel function" to project
the sample points to high dimension space to make them
linearly separable, while the perceptron assumes the sample
points are linearly separable.
• SVM Requires more parameters as compared to
– choice of kernel
– selection of kernel parameters
– selection of the value of the margin parameter
SVM and Margins
SVM for Nonlinear Data
Acknowledgements
Introduction to Machine Learning, Alphaydin
Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
Pattern Recognition and Analysis Course – A.K. Jain, MSU
Pattern Classification” by Duda et al., John Wiley & Sons.
Material in these slides has been taken from, the following
resources
37