0% found this document useful (0 votes)

19 views12 pages

Classification Using A Single Perceptron: ZV0GDF798E

The document explains the concept of deep learning, emphasizing the role of perceptrons as the fundamental building blocks of neural networks. It details how perceptrons operate through linear functions and thresholds, and how stacking multiple perceptrons creates non-linear functions necessary for complex tasks like image classification. Additionally, it discusses the importance of activation functions in introducing non-linearity and the challenges of training deep networks with many parameters.

Uploaded by

Mandy Law

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

Classification Using A Single Perceptron: ZV0GDF798E

Uploaded by

Mandy Law

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Classification Using a Single Perceptron

Let’s understand the term “deep learning”. The word deep refers to the layers. So deep
learning is about building complex hierarchical representations from simple building
blocks.

”The hierarchy of concepts allows the computer to learn complicated concepts by

building them out of simpler ones. If we draw a graph showing how these concepts are
built on top of each other, the graph is deep, with many layers. For this reason, we call
this approach to AI deep learning.” - Ian Goodfellow, the inventor of Generative
Adversarial Networks (GANs)

Deep learning models have the ability to perform automatic feature extraction from
raw data, also called feature learning.

Now that we understand deep learning is built using simple building blocks, let’s dive
in and understand what these building blocks are.
[email protected]
ZV0GDF798E

The perceptron is the basic building block of deep learning. Sometimes we also call it a
neuron, so the neural network is nothing but a network or layer of neurons or
perceptrons.

Let’s recall what the perceptron is:

It’s a function that has several inputs and one output. Let's say that it has n inputs
{X1,....., Xn}. Then the output of a perceptron is computed in two steps:

1
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Step 1: We compute a linear function of the inputs; the coefficients of this linear
function are called weights. We define these weights with random values.

[email protected]
ZV0GDF798EStep 2: We take this linear combination as the output of the first step and compute a
threshold. This threshold takes any value above some cutoff tau and maps it to the
value +1, and maps everything below tau to -1. The second step is the only
non-linearity in the network.

So the above perceptron required n weights corresponding to n inputs and one value of
tau. So in total, we need n+1 parameters for each perceptron.

What's the connection to Support Vector Machines?

The perceptron only recognizes linear patterns, and to combat this, we introduce the
idea of mapping the input space to a new space using a kernel. In deep learning, we
take a different approach to SVMs; we layer perceptrons on top of each other to get our
non-linearity.

Let’s start with a concrete example - the problem of image classification. Here we need
to classify which image contains the dog and which doesn’t.

2
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
These are colored images so each pixel is associated with three values - Red, Green,
and Blue which we call RGB. We can represent our input as a 256 by 256-sized image.
The image dimensions will be (256,256,3) where 256, 256 represent the height and
width of the image, and 3 represents the number of color channels or depth (R, G, and
B - hence 3) for each pixel. Hence, every image is a 256 x 256 x 3 array of values.
[email protected]
ZV0GDF798E

Now, we want the perceptron to solve our image classification problem. What we're
really asking for is a linear function of pixels that returns positive if the image contains
a dog and negative if it doesn't.

3
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
We might imagine a perceptron is just too simple a function to solve such a complex
task efficiently. There probably isn't a nice linear function of the pixels that decides
whether or not there is a dog in the picture. So the key takeaway here is that a simple
linear function cannot identify complex patterns, such as the presence of a dog in
an image. We need more complex functions or non-linearity in the functions to do this.
Here,
[email protected] non-linearity means the function tries to fit a curve instead of a line as the
ZV0GDF798E
decision boundary.

So how can we create more complex functions out of perceptrons as building blocks?
As we have discussed we need to add more layers, and in the simplest setup imagine
that there are just two layers of perceptrons.

4
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Our input is an n-dimensional vector representing a picture. Each perceptron, in the
first layer, has n inputs, each with its own weight. So each perceptron, in the first layer,
computes its own threshold and returns the output. This output will be an input to the
next layer perceptron and these inputs for the second layer perceptron have their own
set of weights and threshold and return the output.

The above diagram can be divided into three regions:

[email protected]
ZV0GDF798EThe red one corresponds to the input layer and the brown one in the middle
corresponds to the hidden layer which is responsible for calculating complex patterns.
The orange one is the output layer.

So,
Total number of layers = Number of hidden layers + output layer

We don’t count the input layer in the total number of layers.

As we know, each perceptron has n+1 parameters. So in the above network, the first
layer has m perceptrons, which take a total of (n+1) x m parameters. Similarly, for the
next layer, we have one perceptron that takes m+1 parameters. So the total number of
parameters this network takes is:

5
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Let’s understand all these steps with a simple example:
Here we consider 2 perceptrons in the first layer and 1 perceptron in the second layer.

Suppose we have 6 input values,

𝑥1 10

𝑥2 24

𝑥3 15

𝑥4 18

𝑥5 5

𝑥6 20

We have
[email protected] two layers. Let’s start with layer 1 with two perceptrons. Since we have 6
ZV0GDF798E
inputs, each perceptron takes 6 weights (random values for now), and let’s say the
threshold is 10. So:
Perceptron1_layer_1 = 𝑥 𝑤 + 𝑥 𝑤 + 𝑥 𝑤 + 𝑥 𝑤 + 𝑥 𝑤 + 𝑥 𝑤
1 1 2 2 3 3 4 4 5 5 6 6
= (10 x 0.2 )+ (24 x 0.1) + (15 x 0.15) + (18 x 0.25) + (5 x 0.4) + (20 x 0.12 )
= 15.55
Now if we compare this with the threshold value (which is 10), it’s higher than the
threshold, so the function returns +1 and not -1.

Now we need to calculate this for perceptron 2. It takes six weights which are random
values different from the first layer, and let’s say the threshold here is 15.
Perceptron2_layer_1 = 𝑥 𝑤' + 𝑥 𝑤' + 𝑥 𝑤' + 𝑥 𝑤' + 𝑥 𝑤' + 𝑥 𝑤'
1 1 2 2 3 3 4 4 5 5 6 6
= (10 x 0.14 )+ (24 x 0.23) + (15 x 0.2) + (18 x 0.3) + (5 x 0.28) + (20 x 0.1 )
= 18.72
Now if we compare this with the threshold value (which is 15), it’s higher than the
threshold, so again, the function returns +1.

6
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
So the outputs of the first layer are { +1, +1 }. These will be inputs for the second
model. For the second layer which has one perceptron, we do a similar calculation. It
has two inputs, so we need two weights here along with a threshold: let’s say 0.5.
Perceptron1_layer_2 = 𝑥 𝑤 + 𝑥2𝑤22 = (1 x 0.5) + (1 x 0.2) = 0.7
1 21
Comparing this with the threshold value (which is 0.5), it returns +1 as it’s higher.

The total number of parameters are = (n+1)m + (m+1) = (6+1)2 + (2+1) = 17.

This is how the values are calculated and can be extended to multiple layers with
multiple perceptrons.

We can take this idea much further and we can have more than two layers - the
functions that we get are called deep neural networks and in practice, one usually
sets them to have between 6 and 8 layers and millions of perceptrons in between.
There are several fragments of the intuition behind these types of functions. The hope
is that the lower-level layers of the network identify some base features like edges and
patterns
[email protected] and that each layer on top of them, builds on the previous layer to create
ZV0GDF798E
much more complex features.

Source - Andrew NG

7
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Let's take an instance, where the model has to identify whether or not a dog is present
in an image. First, you would need to identify its edges, and then you'd identify which
collections of arrangements of edges represent legs, which represent the body and the
head, and then which arrangements of these parts represent the dog.

Recall that in ImageNet, we want to correctly classify according to 1000 different

labels at once. Even though there are a million total images, that's not actually that
many examples considering the number of labels. So what's important is that the
features that are useful for identifying one breed of dog can be useful in identifying
other breeds of dog as well. In this sense, a deep network can have 1000 outputs, one
for each label, built on top of a common deep network underneath it, one which is
hopefully identifying useful high-level representations that are needed to understand
images.

Another rationalization for deep neural networks is that they parallel what happens in
the visual cortex. There's still a lot about the brain that we don't understand. But it
does seem that the visual cortex has a similar type of hierarchical structure, with
[email protected]
ZV0GDF798Eneurons in the lower layers recognizing lower level features like edges.

Moreover, we can measure how quickly a human can recognize an object, and how
quickly a neuron fires. These tell us that even though the visual cortex is performing
some hierarchical computation, it requires at most six to eight layers (with a sufficient
number of neurons in each layer) in order to solve even complex, high-level recognition
problems.

8
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
So let's conclude - first, the perceptron has a linear and nonlinear threshold function. If
it wasn't for the threshold, creating deeper networks would not buy us anything. We
would still be composing linear functions and no matter how deep we make the
computation, the function we get would still be linear. It's the non-linearity that we
added that makes deep networks so functionally expressive. We have understood that
by using the threshold function we are introducing some kind of non-linearity into the
network, but why do we need non-linearity? It is hard to find any physical world
phenomenon which follows linearity straightforwardly. We need a non-linear function
that can approximate the non-linear phenomenon we observe in the real world. The
image below is an example of such non-linear patterns that need to be identified:

[email protected]
ZV0GDF798E

To introduce non-linearity we use some kind of function, for example, the threshold
function, and these functions are called activation functions. The purpose of the
activation function is to introduce non-linearity into the output of a neuron.

In fact, there are many other non-linear functions or activation functions that we could
have chosen instead of a threshold, we could have chosen a logistic sigmoid or
hyperbolic tangent or any other smooth approximation to a step function. This and
many other aspects of the architecture of deep neural networks are all valid design
choices that have their own merits. There are many research papers that grapple with

9
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
issues like, which non-linear functions work the best, and how we should structure the
internal layers, for example using the convolution operation. We will talk more about
these issues, but it's good to know that they're very important.

Second, we've said nothing about actually finding the parameters of the deep network.
So while calculating the perceptron we have used weights that we defined randomly.
These weights need to be learned by the network. Modern deep networks have
millions of parameters, which is a very large space to search for. When we talked
about the support vector machine we had the perceptron algorithm that told us if there
is a linear classifier, we can find it algorithmically. But even if there is a setting of the
parameters of a deep network that really can classify images accurately into say
different breeds of dog, how can we find them? There is no simple answer to this
question. There are approaches that seem to work in practice, but why they do is still
very much a mystery, and perhaps a phenomenon that has to do with some of the
strange idioms of searching in such a high dimensional space.

Additional Content:
[email protected]
ZV0GDF798ETypes of Activation functions:

1) Sigmoid
The main reason why we use the sigmoid function is that the range of values it
outputs is between 0 and 1. Therefore, it is especially used for models where
we have to predict the probability as an output since the probability of anything
exists only between the range of 0 and 1 - in such contexts, Sigmoid is the right
choice. However, the logistic sigmoid function when used in the hidden layers
can cause a neural network to get stuck during training time, due to the
Vanishing / Exploding Gradient problem. Therefore, the Sigmoid is mostly only
used in the output layer, especially in the case of binary classification.

2) Tanh
Tanh is a shifted version of the Sigmoid function, where its output range is
between -1 and 1. The mean of the activations that come out of the hidden layer
is closer to having a zero mean, therefore the data is more centered, which
makes the learning for the next layer easier and faster. One of the downsides of

10
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
both Sigmoid and Tanh is the Vanishing / Exploding Gradient problem; if our
weighted sum input is either very large or very small, then the gradient (also
called the derivative or slope) of this function becomes very small and ends up
being very close to zero. This can slow down learning, and this is why Sigmoid
and Tanh are not preferred in the hidden layers of deep neural networks.

3) ReLU
ReLU is increasingly the default choice of activation function in the hidden layers
of deep neural networks. If you are not sure what to use in the hidden layers,
just use the ReLU activation function or one of its variants. It is a bit faster to
compute than other activation functions, and gradient descent does not get
stuck as much on plateaus and thanks to the fact that it does not saturate for
the large input values as opposed to the logistic sigmoid function or the
hyperbolic tangent function.
One disadvantage of ReLU is that the derivative is equal to zero when (some
weighted input) is negative. The problem is known as the dying ReLU. If the
weights in the network always lead to negative inputs into a ReLU neuron, that
[email protected]
ZV0GDF798E neuron won't be effectively contributing to the network training. There is
another version of the ReLU activation function called the Leaky ReLU, that
solves the dying ReLU problem. It usually works better than the ReLU activation
function.

4) LeakyReLU
The Leaky ReLU activation function usually works better than ReLU, but it is not
used that much in practice.

5) Softmax
The Softmax activation function is used in neural networks when we want to
build a multi-class classifier that solves the problem of assigning an instance to
one class when the number of possible classes is larger than two (otherwise we
can simply use Sigmoid if possible classes=2).

11
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
In the output layer, the activation functions are selected based on the problem
statement, for example, if it's:
● Regression: Linear / No activation function (because the values are unbounded)
● Classification:
○ Binary classification - Sigmoid
○ Multiclass classification - Softmax

In the above explanation, we have initialized the weights randomly, but we should be
careful and cognizant about how we're defining the weights. Weight Initialization in
neural networks is an active research topic in its own right. You may refer to this link to
get a better idea about it > Weight Initialization.

[email protected]
ZV0GDF798E

12
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.

Industrial Engineering and Management by Pravin Kumar
100% (10)
Industrial Engineering and Management by Pravin Kumar
673 pages
Module I
No ratings yet
Module I
109 pages
DL Unit-1 San
No ratings yet
DL Unit-1 San
58 pages
Assignment B 3 Customer Churn Modeling
No ratings yet
Assignment B 3 Customer Churn Modeling
7 pages
Unit II
No ratings yet
Unit II
33 pages
Fan Kit Instruction
No ratings yet
Fan Kit Instruction
4 pages
01 Neural Nets
No ratings yet
01 Neural Nets
15 pages
A Beginner's Guide To Understanding Convolutional Neural Networks
No ratings yet
A Beginner's Guide To Understanding Convolutional Neural Networks
11 pages
Operations On Array
No ratings yet
Operations On Array
9 pages
Neural Network
No ratings yet
Neural Network
82 pages
EC8491 CT Notes Full - by WWW - EasyEngineering.net 4 PDF
No ratings yet
EC8491 CT Notes Full - by WWW - EasyEngineering.net 4 PDF
152 pages
FAI DeepLearning
No ratings yet
FAI DeepLearning
35 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
43 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
IEEE Paper Batch02
No ratings yet
IEEE Paper Batch02
4 pages
CNN-Based Gender Classification Guide
No ratings yet
CNN-Based Gender Classification Guide
7 pages
Day1 05 Introduction To DeepLearning Part
No ratings yet
Day1 05 Introduction To DeepLearning Part
20 pages
Nns Are A Study of Parallel and Distributed Processing Systems (PDPS)
No ratings yet
Nns Are A Study of Parallel and Distributed Processing Systems (PDPS)
46 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
27 pages
Unit 3 Endsem PYQs
No ratings yet
Unit 3 Endsem PYQs
19 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Deep Learning Module 1 Important Topics PYQs
No ratings yet
Deep Learning Module 1 Important Topics PYQs
27 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
1 清实录10 高宗纯皇帝实录卷六○至卷一五七
No ratings yet
1 清实录10 高宗纯皇帝实录卷六○至卷一五七
600 pages
Unit-4 MLT
No ratings yet
Unit-4 MLT
105 pages
Unit V
No ratings yet
Unit V
49 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
CV Lab 7
No ratings yet
CV Lab 7
4 pages
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
No ratings yet
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
7 pages
Morphological Analysis Guide
No ratings yet
Morphological Analysis Guide
5 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
cs188 sp24 Note22
No ratings yet
cs188 sp24 Note22
8 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
Fall 2011 - CS502 - 1
No ratings yet
Fall 2011 - CS502 - 1
3 pages
Deep Learning & Neural Networks
No ratings yet
Deep Learning & Neural Networks
10 pages
04introduction To Neural Networks
No ratings yet
04introduction To Neural Networks
62 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Math Analysis for Business Students
No ratings yet
Math Analysis for Business Students
54 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
Slides NN
No ratings yet
Slides NN
59 pages
Eperf Promo
No ratings yet
Eperf Promo
8 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Artificial Neural Networks (Anns) : Intro
No ratings yet
Artificial Neural Networks (Anns) : Intro
15 pages
L2 Perceptrons, Function Approximation, Classification
No ratings yet
L2 Perceptrons, Function Approximation, Classification
89 pages
Python Lab
No ratings yet
Python Lab
21 pages
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
No ratings yet
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
9 pages
UNIT 3-Multilayer-Perceptrons
No ratings yet
UNIT 3-Multilayer-Perceptrons
23 pages
Lab ANDandXOR REGRESSION ANN
No ratings yet
Lab ANDandXOR REGRESSION ANN
13 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
48 pages
8.2.1: Introduction To Neural Networks: Objectives
No ratings yet
8.2.1: Introduction To Neural Networks: Objectives
11 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Chapter 2 Linear Signal Models
No ratings yet
Chapter 2 Linear Signal Models
40 pages
Lec 8
No ratings yet
Lec 8
60 pages
Neural Networks for Visual Recognition
No ratings yet
Neural Networks for Visual Recognition
12 pages
Pothole Detection via Lightweight Networks
No ratings yet
Pothole Detection via Lightweight Networks
90 pages
Unit 1
No ratings yet
Unit 1
19 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
27 pages
Audison Thesis Car Audio
100% (3)
Audison Thesis Car Audio
5 pages
10 nn1
No ratings yet
10 nn1
162 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
Untitledfff
No ratings yet
Untitledfff
40 pages
International Baccalaureate (IB) : Artificial Neural Networks - #1
No ratings yet
International Baccalaureate (IB) : Artificial Neural Networks - #1
33 pages
Simple Guide To Neural Networks and Deep Learning in Python - HackerEarth Blog
No ratings yet
Simple Guide To Neural Networks and Deep Learning in Python - HackerEarth Blog
24 pages
Discrete Math for CS Students
No ratings yet
Discrete Math for CS Students
46 pages
DPS5020 Operating Manual
No ratings yet
DPS5020 Operating Manual
9 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
55 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
E-Guard: Home Security for Cairo
No ratings yet
E-Guard: Home Security for Cairo
23 pages
HEC-RAS User's Manual Version 4.1
No ratings yet
HEC-RAS User's Manual Version 4.1
790 pages
Siprotec 7sa511 Distance Protection Relay: Function Overview
No ratings yet
Siprotec 7sa511 Distance Protection Relay: Function Overview
3 pages
Outdoor Waterproof PoE Switch Guide
No ratings yet
Outdoor Waterproof PoE Switch Guide
12 pages
Payroll Calculator & Database Code
No ratings yet
Payroll Calculator & Database Code
49 pages
Anaconda Training PDF
100% (1)
Anaconda Training PDF
2 pages
State of California Security Evaluation ES&S EVS 5210
No ratings yet
State of California Security Evaluation ES&S EVS 5210
12 pages
AI-Powered DeFi Trading Platform
No ratings yet
AI-Powered DeFi Trading Platform
22 pages
Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer
No ratings yet
Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer
44 pages
BlackBelt Plus Roadmap - 23 - v2
No ratings yet
BlackBelt Plus Roadmap - 23 - v2
6 pages
Windguru - SEVILLA CAPITAL
No ratings yet
Windguru - SEVILLA CAPITAL
1 page
Android - Using The SDK
No ratings yet
Android - Using The SDK
32 pages
Flow Chart 0: Overall Flow For Normal Purchase Procedure
No ratings yet
Flow Chart 0: Overall Flow For Normal Purchase Procedure
1 page

Classification Using A Single Perceptron: ZV0GDF798E

Uploaded by

Classification Using A Single Perceptron: ZV0GDF798E

Uploaded by

Classification Using a Single Perceptron

”The hierarchy of concepts allows the computer to learn complicated concepts by

Let’s recall what the perceptron is:

What's the connection to Support Vector Machines?

The above diagram can be divided into three regions:

We don’t count the input layer in the total number of layers.

Suppose we have 6 input values,

Recall that in ImageNet, we want to correctly classify according to 1000 different

You might also like