Introduction to Neural
Network
Department of Information
Technology
Ambo University
What are neural networks?
A neural network can mean
either a real biological neural
network such as the one in your
brain, or
an artificial neural network
simulated in a computer.
Neurons, cell bodies, and signals
A neural network,
either biological and artificial, consists of a
large number of simple units,
neurons, that receive and transmit signals to
each other.
The neurons are very simple processors
of information, consisting of
a cell body and
wires that connect the neurons to each
other.
Most of the time, they do nothing but sit
still and watch for signals coming in
through the wires.
Dendrites, axons, and synapses
In the biological lingo,
we call the wires that provide the input to
the neurons dendrites.
Sometimes, depending on the incoming
signals,
the neuron may fire and send a signal out
for the other neurons to receive.
The wire that transmits the outgoing
signal is called an axon.
Each axon may be connected to one or
more dendrites at intersections that are
called synapses.
Dendrites, axons, and synapses
(contd)
Isolated from its fellow-neurons,
a single neuron is quite unimpressive, and
capable of only a very restricted set of
behaviors.
When connected to each other,
however, the system resulting from their
concerted action can become extremely
complex.
The behavior of the system is
determined by the ways in which the
neurons are wired together.
Dendrites, axons, and synapses
(contd)
Each neuron reacts to the
incoming signals in a specific way
that can also adapt over time.
This adaptation is known to be the
key to functions such as memory
and learning.
Why develop artificial neural
networks?
The purpose of building artificial
models of the brain can be
neuroscience, the study of the brain
and the nervous system in general.
It is tempting to think that by
mapping the human brain in
enough detail,
we can discover the secrets of
human and animal cognition and
consciousness.
Why develop artificial neural
networks?
However, even while we seem to
be almost as far from
understanding the mind and
consciousness,
there are clear milestones that have
been achieved in neuroscience.
By better understanding of the
structure and function of the brain,
we are already reaping some concrete
rewards.
Why develop artificial neural
networks?
We can, for instance,
identify abnormal functioning and
try to help the brain avoid them and
reinstate normal operation.
This can lead to life-changing new
medical treatments for people
suffering from neurological
disorders:
epilepsy, Alzheimers disease,
problems caused by developmental
disorders or damage caused by
Why develop artificial neural
networks?
In fact, another main reason for building
artificial neural networks has
little to do with understanding biological
systems.
It is to use biological systems as
an inspiration to build better AI and
machine learning techniques.
The idea is very natural:
the brain is an amazingly complex
information processing system capable of a
wide range of intelligent behaviors and
therefore, it makes sense to look for
inspiration in it when we try to create
Why develop artificial neural
networks?
Neural networks have been
a major trend in AI since the 1960s.
Well return to the waves of
popularity in the history of AI in the
final part.
Currently neural networks are
again at the very top of the list
as deep learning is used to achieve
significant improvements in many
areas
such as natural language and image
What is so special about neural
networks?
The case for neural networks in general
as an approach to AI is
based on a similar argument as that for
logic-based approaches.
In the latter case,
it was thought that in order to achieve
human-level intelligence,
we need to simulate higher-level thought
processes, and
in particular, manipulation of symbols
representing certain concrete or abstract
concepts using logical rules.
Neural network key feature
For one, in a traditional computer,
information is processed in a
central processor
which can only focus on doing one
thing at a time.
The CPU can retrieve data to be
processed from the computers
memory,
and store the result in the memory.
Neural network key feature
(contd)
Thus, data storage and processing are
handled by two separate components of
the computer:
the memory and the CPU.
In neural networks, the system consists
of
a large number of neurons,
each of which can process information on its own
so that instead of having a CPU process each
piece of information one after the other,
the neurons process vast amounts of information
simultaneously.
Neural network key feature
(contd)
The second difference is that
data storage (memory) and
processing isnt separated like in
traditional computers.
The neurons both
store and process information
so that there is no need to retrieve
data from the memory for processing.
Neural network key feature
(contd)
The data can be stored short term
in the neurons themselves
(they either fire or not at any given
time) or
for longer term storage, in the
connections between the neurons
their so called weights
Neural network key feature
(contd)
Because of these two differences,
neural networks and traditional
computers are suited for somewhat
different tasks.
Even though it is entirely possible
to simulate neural networks in
traditional computers,
which was the way they were used for
a long time, their maximum capacity
is achieved only when we use special
hardware
Neural network key feature
(contd)
(computer devices) that can
process many pieces of
information at the same time.
This is called parallel processing.
Incidentally, graphics processors
(or graphics processing units,
GPUs) have this capability and
they have become a cost-effective
solution for running massive deep
learning methods.
How neural networks are
built
Weights and inputs
The basic artificial neuron model
involves
a set of adaptive parameters, called
weights like in linear and logistic
regression.
Just like in regression,
these weights are used as multipliers
on the inputs of the neuron, which are
added up.
Weights and inputs (contd)
The sum of the weights times the
inputs is called
the linear combination of the
inputs.
You can probably recall the
shopping bill analogy:
you multiply the amount of each item
by its price per unit and add up to get
the total.
Weights and inputs (contd)
If we have a neuron with six inputs
(analogous to the amounts of the
six shopping items:
potatoes, carrots, and so on),
input1, input2, input3, input4, input5,
and input6,
we also need six weights.
The weights are analogous to the
prices of the items.
Weights and inputs (contd)
Well call them
weight1, weight2, weight3, weight4,
weight5, and weight6.
In addition,
well usually want to include an
intercept term like we did in linear
regression.
This can be thought of as
a fixed additional charge due to
processing a credit card payment, for
example.
Weights and inputs (contd)
We can then calculate the linear
combination like this:
linear combination =
intercept + weight1 × input1 + ... +
weight6 × input6
With some example numbers we
could then get:
10.0 + 5.4 × 8 + (-10.2) × 5 + (-0.1)
× 22 + 101.4 × (-5) + 0.0 × 2 + 12.0
× (-3) = -543.0
Weights and inputs (contd)
The weights are almost always
learned
from data using the same ideas as in
linear or logistic regression, as
discussed previously.
But before we discuss this in more
detail,
we'll introduce another important
stage that a neuron completes before
it sends out an output signal.
Activations and outputs
Once the linear combination has
been computed,
the neuron does one more operation.
It takes the linear combination and
puts it through a so-called
activation function.
Typical examples of the activation
function include:
identity function: do nothing and just
output the linear combination
Activations and outputs (contd)
step function: if the value of the linear
combination is greater than zero,
send a pulse (ON), otherwise do nothing
(OFF)
sigmoid function: a soft version of
the step function
Activations and outputs (contd)
Note that with the first activation
function, the identity function,
the neuron is exactly the same as
linear regression.
This is why the identity function is
rarely used in neural networks:
it leads to nothing new and
interesting.
How neurons activate
Real, biological neurons
communicate
by sending out sharp, electrical
pulses called spikes,
so that at any given time, their
outgoing signal is either on or off (1
or 0).
The step function imitates this
behavior.
However, artificial neural networks
tend
How neurons activate (contd)
Thus, to use a somewhat awkward
figure of speech,
real neurons communicate by
something similar to the Morse code,
whereas artificial neurons
communicate by adjusting the pitch
of their voice as if yodeling.
How neurons activate (contd)
The output of the neuron,
determined by
the linear combination and the
activation function, can be used to
extract a prediction or a decision.
For example, if the network is
designed
to identify a stop sign in front of a
self-driving car,
the input can be the pixels of an image
captured by a camera and
How neurons activate (contd)
Learning or adaptation in the
network occurs
when the weights are adjusted
so as to make the network produce the
correct outputs, just like in linear or
logistic regression.
Many neural networks are very
large, and the largest contain
hundreds of billions of weights.
Optimizing them all can be a
daunting task that requires
Perceptron: the mother of all
ANNs
The perceptron is simply a fancy
name
for the simple neuron model with the
step activation function we discussed
above.
It was among the very first formal
models of neural computation and
because of its fundamental role in
the history of neural networks,
it wouldnt be unfair to call it the
mother of all artificial neural
Perceptron: the mother of all
ANNs
It can be used as a simple
classifier in binary classification
tasks.
A method for learning the weights
of the perceptron from data, called
the Perceptron algorithm, was
introduced
by the psychologist Frank
Rosenblatt in 1957.
We will not study the Perceptron
Perceptron: the mother of all
ANNs
Suffice to say that it is just about
as simple as the nearest neighbor
classifier.
The basic principle is to feed the
network training data one example
at a time.
Each misclassification leads to an
update in the weight.
Putting neurons together:
networks
A single neuron would be way
too simple to make decisions and
prediction reliably in most real-life
applications.
To unleash the full potential of
neural networks,
we can use the output of one neuron
as the input of other neurons, whose
outputs can be the input to yet other
neurons, and so on.
Putting neurons together:
networks
The output of the whole network is
obtained as
the output of a certain subset of the
neurons, which are called the output
layer.
We'll return to this in a bit,
after we discussed the way neural
networks adapt to produce different
behaviors
By learning their parameters from
data.
Layers
Often the network architecture is
composed of layers.
The input layer consists of neurons
that get their inputs directly from the
data.
So for example, in an image
recognition task,
the input layer would use the pixel
values of the input image as the
inputs of the input layer.
Layers
The network typically also has
hidden layers that use the other
neurons´ outputs as their input,
and whose output is used as the input
to other layers of neurons.
Finally, the output layer produces
the output of the whole network.
All the neurons on a given layer
get inputs from neurons on the
previous layer and feed their
output to the next.
Layers
A classical example of a multilayer
network is the so-called
multilayer perceptron.
As we discussed above,
Rosenblatts Perceptron algorithm
can be used to learn the weights of
a perceptron.
For multilayer perceptron,
the corresponding learning problem is
way harder and it took a long time
before a working solution was
Layers
But eventually, one was invented:
the backpropagation algorithm
lead to a revival of neural networks in
the late 1980s.
It is still at the heart of many of the
most advanced deep learning
solutions.
A simple neural network
classifier
To give a relatively simple example
of using a neural network
classifier,
we'll consider a task that is very
similar to the MNIST digit recognition
task, namely classifying images in
two classes.
We will first create a classifier to
classify
whether an image shows a cross (x)
or a circle (o).
A simple neural network
classifier
Our images are represented here
as
pixels that are either colored or white,
and the pixels are arranged in 5 × 5
grid.
In this format our images of a cross
and a circle
(more like a diamond, to be honest)
A simple neural network
classifier
In order to build a neural network
classifier,
we need to formalize the problem in a
way
where we can solve it using the
methods we have learned.
Our first step is to represent the
information in the pixels by
numerical values that can be used
as the input to a classifier.
A simple neural network
classifier
Let's use 1 if the square is colored,
and 0 if it is white.
Note that although the symbols in
the graphic are of different color
(green and blue),
our classifier will ignore the color
information and
use only the colored/white
information.
The 25 pixels in the image make
the inputs of our classifier.
A simple neural network
classifier
To make sure that we know which
pixel is which in the numerical
representation,
we can decide to list the pixels in the
same order as you'd read text, so row
by row from the top, and reading
each row from left to right.
The first row of the cross, for
example, is represented as
1,0,0,0,1;
the second row as 0,1,0,1,0, and
A simple neural network
classifier
The full input for the cross input is
then:
1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,
1,0,0,0,1.
We'll use the basic neuron model
where the first step is to compute
a linear combination of the inputs.
Thus need a weight for each of the
input pixels, which means 25
weights in total.
A simple neural network
classifier
Finally, we use the step activation
function.
If the linear combination is
negative, the neuron activation is
zero,
which we decide to use to signify a
cross.
If the linear combination is
positive,
the neuron activation is one, which
we decide to signify a circle.
A simple neural network
classifier
Let's try
what happens when all the weights
take the same numerical value, 1.
With this setup,
our linear combination for the cross
image will be
9 (9 colored pixels, so 9 × 1, and
16 white pixels, 16 × 0), and
for the circle image it will be
8 (8 colored pixels, 8 × 1, and
17 white pixels, 17 × 0).
A simple neural network
classifier
In other words,
the linear combination is positive for
both images
and they are thus classified as circles.
Not a very good result given that
there are only two images to
classify.
A simple neural network
classifier
To improve the result,
we need to adjust the weights in such
a way that
the linear combination will negative for a
cross and positive for a circle.
If we think about what
differentiates images of crosses
and circles,
we can see that circles have no
colored pixels in the center of the
image, whereas crosses do.
A simple neural network
classifier
Likewise, the pixels at the corners of the
image are colored in the cross, but
white in the circle.
We can now adjust the weights.
There are an infinite number of weights
that do the job.
For example,
assign weight -1 to the center pixel (the
13th pixel), and
weight 1 to the pixels in the middle of each
of the four sides of the image, letting all the
other weights be 0.
A simple neural network
classifier
Now, for the cross input, the center
pixel produce the value 1, while for
all the other pixels either the pixel
value of the weight is 0,
so that 1 is also the total value.
This leads to activation 0, and the
cross is correctly classified.
How about the circle then?
A simple neural network
classifier
Each of the pixels in the middle of
the sides produces the value 1,
which makes 4 × 1 = 4 in total.
For all the other pixels either the
pixel value or the weight is zero,
so 4 is the total.
Since 4 is a positive value, the
activation is 1, and
the circle is correctly recognized as
well.