9/28/2023
Artificial Intelligence
Dr. Tran Quang Huy
OUTLINE
2
Chapter 1: Overview of AI
Chapter 2: Artificial Neural Networks
Chapter 3: Searching, Knowledge, Reasoning, and Planning
Chapter 4: Machine learning
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
L L L I-T L L L L P P
L: Lesson; I-T: In-class Test; P: Project
2
9/28/2023
Objectives
3
1. Understand the basics of Neural Networks
2. Being able to move on the more advanced Convolutional Neural Networks
Main contents
4
1. Artificial Neural Networks (ANN) and their relation to biology
2. The seminal Perceptron algorithm
3. Back propagation
4. How to train Neural Networks using Keras library
4
9/28/2023
What are Neural Networks?
5
Question:
- How does your family dog recognize you, the owner, versus a complete and
total stranger?
- How does a small child learn to recognize the difference between a school
bus and a transit bus?
- How do our own brains subconsciously perform complex pattern recognition
tasks each and every day without us even noticing?
What are Neural Networks?
Answer: Each of us contains a real-life biological neural networks that6is
connected to our nervous systems – this network is made up of a large
number of interconnected neurons (nerve cells).
The word “neural” is the adjective form of “neuron”, and “network” denotes a
graph-like structure; therefore, an “Artificial Neural Network” is a computation
system that attempts to mimic (or at least, is inspired by) the neural
connections in our nervous system. Artificial neural networks are also referred
to as “neural networks” or “artificial neural systems”.
It is common to abbreviate Artificial Neural Network and refer to them as
“ANN” or simply “NN”
6
9/28/2023
ANN
7
ANN
8
8
9/28/2023
ANN
9
A simple neural network architecture.
Inputs are presented to the network.
Each connection carries a signal
through the two hidden layers in the
network. A final function computes
the output class label.
ANN
10
10
9/28/2023
ANN
11
11
Read the following and explain the meaning of each part in the figure and equations
12
12
9/28/2023
13
Activation Functions
What is activation function?
How does the activation function work?
Why do we use activation functions?
Listsome types of popular activation
functions?
13
What is activation function?
14
How does the activation function work?
14
9/28/2023
Why do we use activation functions? 15
1. Create non-linear characteristic for model
2. Keep the output in a specific range, such as [0, 1]; [-1, 1]
15
16
Popular Activation
Functions
Find the equation of
each activation
function.
16
9/28/2023
Activation Functions
Step function: 17
Sigmoid function:
ReLU function:
17
Activation Functions
Step function: 18
This is a very simple threshold function. If the weighted sum: we ou tput 1, otherwise,
we output 0.
The output of f is always zero when net is less than or equal zero. If net is greater than
zero, then f will return one.
What is the problems of step function?
18
9/28/2023
Activation Functions
19
Sigmoid function:
Sigmoid function is a more common activation function used in the history of NN.
19
Activation Functions
20
Sigmoid function:
Sigmoid function is a more common activation function used in the history of NN.
Why???
The primary advantage here is that the smoothness of the sigmoid function makes it easier to
devise learning algorithms.
The sigmoid function is a better choice for learning than the simple step function since it:
1. Is continuous and differentiable everywhere.
2. Is symmetric around the y-axis.
3. Asymptotically approaches its saturation values.
20
9/28/2023
Activation Functions
21
Sigmoid function:
Disadvantage of Sigmoid function:
1. The outputs of the sigmoid are not zero centered.
2. Saturated neurons essentially kill the gradient, since the delta of the gradient will be
extremely small.
21
Activation Functions
22
Tanh function:
The hyperbolic tangent, or tanh (with a similar shape of the sigmoid) was also heavily used as
an activation function up until the late 1990s.
The tanh function is zero centered, but the gradients are still killed when neurons become
saturated
22
9/28/2023
Activation Functions
23
ReLU function:
Rectified Linear Unit (ReLU) is also called “ramp functions” due to how they look
when plotted.
23
Activation Functions
24
ReLU function:
Note:
Notice how the function is zero for negative inputs but then linearly increases for
positive values. The ReLU function is not saturable and is also extremely
computationally efficient.
The ReLU activation function tends to outperform both the sigmoid and tanh
functions in nearly all applications.
24
9/28/2023
Activation Functions
25
ReLU function:
As of 2015, ReLU is the most popular activation function used in deep learning.
However, a problem arises when we have a value of zero – the gradient cannot be
taken.
25
Activation Functions
26
ReLU6 function:
This function limits the problem of exploding gradients
26
9/28/2023
Activation Functions
27
Leaky ReLU function:
Leaky ReLUs allow for a small, non-zero gradient when the unit is not active
27
Activation Functions
28
Leaky ReLU function:
The function is indeed allowed to take on
a negative value, unlike traditional ReLUs
which “clamp" the function output at zero.
Parametric ReLUs build on Leaky ReLUs
and allow the parameter α to be learned on
an activation-by-activation basis, implying
that each node in the network can learn a
different “coefficient of leakage” separate
from the other nodes.
28
9/28/2023
Feedforward Network Architectures
29
In this type of architecture, a
connection between nodes is
only allowed from nodes in
layer i to nodes in layer i+1
(hence the term, feedforward).
There are no backward or inter-
layer connections allowed.
When feedforward networks
include feedback connections
(output connections that feed
back into the inputs) they are
called recurrent neural networks.
29
30
Feedforward Network Architectures
This figure is a 3-2-3-2 feedforward network
Layer 0 contains 3 inputs, our xi values. These could be raw
pixel intensities of an image or a feature vector extracted
from the image.
Layers 1 and 2 are hidden layers containing 2 and 3 nodes,
respectively.
Layer 3 is the output layer or the visible layer – there is where
we obtain the overall output classification from our network.
The output layer typically has as many nodes as class labels;
one node for each potential output.
For example, if we were to build an NN to classify
handwritten digits, our output layer would consist of 10 nodes,
one for each digit 0-9.
30
9/28/2023
PERCEPTRON ALGORITHM
31
Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron
learning rule based on the original MCP neuron. A Perceptron is an algorithm for
supervised learning of binary classifiers. This algorithm enables neurons to learn
and processes elements in the training set one at a time.
https://www.javatpoint.com/perceptron-in-machine-learning
31
32
TYPES OF PERCEPTRON
1.Single layer (a): Single layer
perceptron can learn only linearly
separable patterns.
2. Multilayer (b): Multilayer
perceptrons can learn about two
or more layers having a greater
processing power.
https://www.javatpoint.com/perceptron-in-machine-learning
32
9/28/2023
33
TYPES OF PERCEPTRON
A single-layered perceptron model consists feed-forward network and also includes a
threshold transfer function inside the model. The main objective of the single-layer
perceptron model is to analyze the linearly separable objects with binary outcomes.
A single layer perceptron model do not contain recorded data, so it begins with
inconstantly allocated input for weight parameters. Further, it sums up all inputs
(weight). After adding all inputs, if the total sum of all inputs is more than a pre-
determined value, the model gets activated and shows the output value as +1.
If the outcome is same as pre-determined or threshold value, then the performance
of this model is stated as satisfied, and weight demand does not change. However,
this model consists of a few discrepancies triggered when multiple weight inputs
values are fed into the model. Hence, to find desired output and minimize errors,
some changes should be necessary for the weights input.
33
34
TYPES OF PERCEPTRON
The multi-layer perceptron model is also known as the
Backpropagation algorithm, which executes in two
stages as follows:
•Forward Stage: Activation functions start from the
input layer in the forward stage and terminate on the
output layer.
•Backward Stage: In the backward stage, weight and
bias values are modified as per the model's
requirement. In this stage, the error between actual
output and demanded originated backward on the
output layer and ended on the input layer.
34
9/28/2023
35
TYPES OF PERCEPTRON
Advantages of Multi-Layer Perceptron:
•A multi-layered perceptron model can be used to solve complex non-
linear problems.
•It works well with both small and large input data.
•It helps us to obtain quick predictions after the training.
•It helps to obtain the same accuracy ratio with large as well as small
data.
Disadvantages of Multi-Layer Perceptron:
• Computations are difficult and time-consuming.
•It is difficult to predict how much the dependent variable affects each
independent variable.
•The model functioning depends on the quality of the training.
35
Basic Components of Perceptron
36
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains
three main components:
• Input Nodes or Input Layer
• Weight and Bias
• Activation Function https://www.javatpoint.com/perceptron-in-machine-learning
36
9/28/2023
Basic Components of Perceptron
37
Types of Activation functions:
https://www.javatpoint.com/perceptron-in-machine-learning
37
How does Perceptron work?
38
In Machine Learning, Perceptron is
considered as a single-layer neural network
that consists of four main parameters named
input values (Input nodes), weights and Bias,
net sum, and an activation function.
The perceptron model begins with the
multiplication of all input values and their
weights, then adds these values together to
create the weighted sum. Then this weighted
sum is applied to the activation function 'f' to Write the final equation
obtain the desired output. This activation based on these info?
function is also known as the step
function and is represented by 'f'. https://www.javatpoint.com/perceptron-in-machine-learning
38
9/28/2023
How does Perceptron work?
39
For example, x1 = 2, x2 = 3, x3 = 1, wn are certain numbers in the range
[0,1], step function is used. Estimate the ouput.
39
40
40
9/28/2023
41
Problem 1: The input to a single-input neuron is 2.0, its weight is 2.3 and its bias is -3.
i. What is the net input to the transfer function?
ii. What is the neuron output?
41
42
42
9/28/2023
43
Problem 2: The input to a single-input neuron is 2.0, its weight is 2.3 and its bias is -3.
What is the output of the neuron if it has the following transfer functions?
i. Hard limit
ii. Linear
iii. Log-sigmoid
43
44
Problem 3:
Given a two-input neuron with the following parameters: b = 1.2, W = [3 2],
and p = [-5 6]T, calculate the neuron output for the following transfer
functions:
i. A symmetrical hard limit transfer function
ii. A saturating linear transfer function
iii. A hyperbolic tangent sigmoid (tansig) transfer function
44
9/28/2023
45
45
An illustrative example
There is a conveyer belt on which the fruit is 46
loaded. This conveyer passes through a set of
sensors, which measure three properties of the
fruit: shape, texture and weight.
Value 1 -1
Shape round elliptical
Texture smooth rough
Weight >1 pound <=1 pound
The three sensor outputs will then be input to a neural network. The purpose of the
network is to decide which kind of fruit is on the conveyor. Let’s assume that there
are only two kinds of fruit on the conveyor: apples and oranges.
46
9/28/2023
An illustrative example
47
Apply the following perceptron model to the previous problem in the case of
two inputs
47
An illustrative example
48
If w1,1 = -1, w1,2 = 1, find a?
48
9/28/2023
An illustrative example
49
Therefore, if the inner product of the weight matrix (a single row vector in this case) with
the input vector is greater than or equal to -b, the output will be 1. If the inner product of the
weight vector and the input is less than -b, the output will be -1.
This divides the input space into two parts. The figure illustrates this for the case where b = -
1. The blue line in the figure represents all points for which the net input is equal to 0:
n = [-1 1]p – 1 = 0
49
An illustrative example
50
The decision boundary between the categories is determined by the equation
Wp + b = 0
Because the boundary must be linear, the single-layer perceptron can only be used to
recognize patterns that are linearly separable
50
9/28/2023
An illustrative example
51
Apply the following perceptron model to the previous problem in the case of
three inputs
Find a
51
An illustrative example
52
We want to choose the bias and the elements of the weight matrix so that
the perceptron will be able to distinguish between apples and oranges.
For example, we may want the output of the perceptron to be 1 when an
apple is input and -1 when an orange is input.
52
9/28/2023
AND, OR, and XOR Datasets
53
53
54
Perceptron Training Procedure and the Delta Rule (step 2c)
54
9/28/2023
Implementing the Perceptron in Python
55
55
Implementing the Perceptron in Python
56
56
9/28/2023
Implementing the Perceptron in Python
57
57
Implementing the Perceptron in Python
58
58
9/28/2023
Implementing the Perceptron in Python
59
59
Implementing the Perceptron in Python
60
60
9/28/2023
Implementing the Perceptron in Python
61
61
Implementing the Perceptron in Python
62
62
9/28/2023
Implementing the Perceptron in Python
63
63
Evaluating the Perceptron Bitwise Datasets
64
64
9/28/2023
Evaluating the Perceptron Bitwise Datasets
65
65
Evaluating the Perceptron Bitwise Datasets
66
66
9/28/2023
Evaluating the Perceptron Bitwise Datasets
67
67
Evaluating the Perceptron Bitwise Datasets
68
68
9/28/2023
Evaluating the Perceptron Bitwise Datasets
69
69
70
70