Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views35 pages

Lecture 2

Uploaded by

lukabakathian92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views35 pages

Lecture 2

Uploaded by

lukabakathian92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

DEEP

LEARNING

Dr. Felix Gonda


Assistant Prof. Computer Science
E-mail: [email protected]

© University of Juba 2024


Introduction
to Deep Generative Attention
Learning Models Mechanisms

AI

ML

DL

Deep Neural Transfer Reinforcement


Networks Learning and Learning
• Artificial Neural Networks (ANNs) Fine-tuning
• Convolutional Neural Networks (CNNs)
• Recurrent Neural Networks (RNNs)

© University of Juba 2024


Biological Neural Networks

© University of Juba 2024


The human brain has

100,000,000,000
Neurons and over 100 trillion synapses.

© University of Juba 2024


Biological Neuron
A biological neuron, or nerve cell, is the fundamental unit of the nervous system. It consists of a cell body, dendrites, and an axon. Dendrites receive signals from other neurons,
while the axon transmits signals to other cells. The strength of a signal is determined by the frequency of action potentials, electrical impulses that travel down the axon.

Axon
(Passes messages away from Axon
the cell body to other Terminal Boutons
Cell Body neurons, muscles, or glands) (Form junctions with
(Soma) other cells)

Dendrites
(Receive messages
from other neurons/
cells)

Synapses
Action Potential
(Junctions where signals
(Electrical signal
are transmitted to other
travel down the
cells)
axon)

© University of Juba 2024


Biological Neural Network
Biological neurons are interconnected through intricate networks of synapses, specialized junctions where electrical signals, known as action potentials, are transmitted
between neurons.

Axon Synapses
(Passes messages away from the cell body (Junctions where
to other neurons, muscles, or glands) signals are
transmitted to
other cells)
Cell Body
(Soma)

Dendrites
(Receive messages
from other
neurons/cells)
Action Potential
(Electrical signal travel
Axon
down the axon)
Terminal Boutons
(Form junctions with
other cells)

© University of Juba 2024


Artificial Neural Networks
Also known as Deep Forward Neural Networks or Multilayer Perceptrons

© University of Juba 2024


Biological Neuron

Arti cial Neuron


(Perceptron)
An artificial neuron, also known as a perceptron, is a simplified model
inspired by a biological neuron. It consists of input, weights, neurons,
activation function, and output.

W1
X1

W2 n
XiWi g y
X2 i=1

Wn
Neuron Activation Output
(Cell body) Function
Weights (Axon)
Xn
(Synapses)

Input (Dendrites)

© University of Juba 2024


fi
Artificial Neuron
(Perceptron)

Bia
The bias ensures that
there is no zero output
in case the input and
weights are zero.
Inputs: X1, X2, X3 are the values that are fed into the neuron. They
can be individual features or a combination of features.
W0
Weights: w1, w2, w3 are assigned to each input, which represents
W1 its importance in determining the neuron's output.
X1 Bias: w0 is an additional term that can be added to the weighted
W2
z g y
sum of inputs. It allows the neuron to adjust its output without
changing the weights.
X2
Weighted sum: The weighted sum of the inputs is calculated by
Neuron Non-Linear Output multiplying each input by its corresponding weight and then
Wn Activation
adding the bias.
Function
Xn Activation function: g is a function takes the weighted sum as
Weights
input and produces the neuron's output.
Inputs

© University of Juba 2024


s

Artificial Neuron
(Mathematical Model of a Perceptron)

Bia
The bias ensures that
there is no zero output
in case the input and
weights are zero.

W0 n

W1 z = W0 + i=1
xi wi
X1

z =w
W2
z g y + x1 w1 + x2 w2 + … + xn wn
X2 0

Neuron Non-Linear Output


Wn Activation
Function
Xn Weights

Inputs

© University of Juba 2024


s

Artificial Neuron
(Mathematical Model of a Perceptron)

Bia
The bias ensures that
there is no zero output
in case the input and
weights are zero.

W0 n

W1 z = W0 + i=1
xi wi
X1

z =w
W2
z g y + x1 w1 + x2 w2 + … + xn wn
X2 0

Neuron Non-Linear Output


Wn Activation
Function
y =g ( z )
Xn Weights

Inputs

© University of Juba 2024


s

1.0

0.8

0.6
Input Output
0.4

0.2

-10 -5 0 5 10

Activation Function
An activation function play a crucial role in neural networks by introducing non-linearity into the
model. This non-linearity is essential for learning complex patterns and relationships in data.
Without activation functions, neural networks would essentially be linear models, limiting their
ability to solve complex problems.

© University of Juba 2024


Common Activation Functions
+ve
+ve

-ve 0,0 +ve -ve +ve


0,0

-ve
-ve
Sigmoid Softmax

1 eyi
Formula: g(y) = Formula: g(yi) =
y
Σj(e j)
1+ e-z
Description: Outputs a value between 0 and 1. Description: Converts a vector of numbers into a probability
Advantages: Smooth and differentiable. distribution.
Disadvantages: Can suffer from the vanishing gradient Advantages: Ensures that the outputs sum to 1, representing
problem, especially for large negative inputs. probabilities.
Usage: Typically used in the output layer of binary Disadvantages: Can be computationally expensive for large input
classification models. vectors.
Usage: Used primarily in the output layer of classification models.

© University of Juba 2024


Common Activation Functions
+ve +ve

-ve 0,0 +ve -ve 0,0 +ve

-ve -ve

Rectified linear unit (ReLU) Linear

Formula: g(y) = max(0,y) Formula: g(y)= y


Description: Outputs the input if it's positive, otherwise Description: The output is simply the input.
outputs 0. Advantages: Simple and computationally efficient.
Advantages: Efficient to compute, reduces the vanishing Disadvantages: Does not introduce non-linearity, limiting the
gradient problem. network's ability to learn complex patterns.
Disadvantages: Can suffer from the "dying ReLU" Usage: Rarely used in deep learning models due to its lack of
problem where neurons can become inactive. non-linearity. It's more common in linear regression tasks.
Usage: Typically used in hidden layers of CNNs and RNNs.

© University of Juba 2024


+ve
Common Activation Functions

-ve 0,0 +ve

-ve

Hyperbolic tangent (tanh) Choosing the Right Activation Function:


The choice of activation function depends on the
ez
- e-z
Formula: g(y) = speci c task and the characteristics of the data. For
ez + e-z
example, sigmoid and tanh are often used in older
Description: Outputs a value between -1 and 1. neural network architectures, but ReLU and its
Advantages: Similar to sigmoid but centered around 0, which variants are more popular in modern deep learning
can be beneficial in certain applications. models due to their computational ef ciency and
Disadvantages: Can still suffer from the vanishing gradient ability to avoid the vanishing gradient problem.
problem.
Usage: It's often used in RNNS and in the hidden layers of
certain feedforward networks.

© University of Juba 2024


fi
fi
Importance of Activation Functions
Why do we need an activation function?

What if we want to create a neural


network to distinguish blue from red
points?

© University of Juba 2024


Importance of Activation Functions
Why do we need an activation function?

What if we want to create a neural A linear activation function would


network to distinguish blue from red produce linear decision irrespective
points? of the network size.

© University of Juba 2024


Importance of Activation Functions
Why do we need an activation function?

What if we want to create a neural A linear activation function would Non-linearities allow us to
network to distinguish blue from red produce linear decision irrespective approximate arbitrary complex
points? of the network size. functions.

© University of Juba 2024


Artificial Neuron
(Perceptron)

An example using ReLU

0 z = 0 + (5x1) + (4x0) + (-1x1)


5 W1 = 1 z = 0 +5 + 0 -1
z g y
4
W2 = 0
z=4
W3 = 1 Non-Linear Output

-1
Activation
Function
y =g ( z )
y =g ( 4 ) g=
y=4

© University of Juba 2024


Artificial Neural Networks
Single Neuron Network
(Note that a single neuron cannot have multiple outputs)

Bias

W0

W1
X1

W2
X2 z y

Wn
Output
X3 Hidden
Layer
Layer
Input
Layer

© University of Juba 2024


Artificial Neural Networks
Single Neuron Network
(Forward propagation of data through a single-layer neural network)

z1 = ( W11 x X1 ) + ( W21 x X2 ) Forward Propagation:


• Calculate weighted sum of
W11 a1 = g( z1 ) inputs at each neuron
• Apply activation function to the
z1 W1k
W12 weighted sum
X1
z2 = ( W21 x X1 ) + ( W22 x X2 ) • Predict output
y
W21
a2 = g( z2 )
X2 z2
W22
W2k

Input Hidden Output z3 = ( W1k x a1 ) + ( W2k x a2 )


Layer Layer Layer
y = g( z3 )

© University of Juba 2024


Artificial Neural Networks
Single Layer Neural Network

Z1

X1

Z2 Y1

X2

Z3 Y2
X3

Z4
Input Output
Layer Layer
Hidden
Layer

© University of Juba 2024


Artificial Neural Networks
Single Layer Neural Network

(1)
W
(2)
z1 W
g( z1 )

X1
z2 g( z2 )
y1

X2 z3 g( z3 )

y2

Xm
g( zk )
zk

z3 = W 0,3 +
(1)

j=1
xj W (1)
j,3

= W 0,3 +
(1)
x1 W 1,3 +
(1)
x2 W
(1)
2,3 + xm W (1)
m,3

© University of Juba 2024


Deep Neural Network
Inputs Hidden Layers Outputs

za,1 zk,1 zp,1

X1
za,2 zk,2 zp,2
y1

X2 za,3 zk,3 zp,3

y2

Xm

za,n zk,n zp,n


a k p

nk-1

zk,i = (k)
W 0,i +
j=1
g(zk-1,j) W
(k)
j,i

© University of Juba 2024


Example Problem
Will a student pass this course?

Suppose we want to predict wether a student will pass this course or not. We can build a
model that take course features that determine a passing the class. For example, we can
start with two features.

X1 X2 RESULTS X1 : Number of lectures a student attended


4 3 0
2 1 0
X2: Number of hours a student spend on their assignments
22 9 1
4 2 0
30 8 1
2 2 0
12 6 1
20 10 1
17 8 1
24 7 1
28 9 1
21 6 1

© University of Juba 2024


Example Problem
Will a student pass this course?

X1 X2 TARGET
4 3 0
2 1 0
22 9 1 X2: No. Hours
4 2 0 spent on the
30 8 1 assignments.
2 2 0
12 6 1
20 10 1
Legend
17 8 1
24 7 1 Pass (1)
28 9 1 Fail (0)
21 6 1
X1: No. of lectures attended.

© University of Juba 2024


Example Problem
Will a student pass this course?

z1

X1

z2 y

Output
X2
Layer

z3
Input
Layer
Hidden
Layer

© University of Juba 2024


Example Problem
Will a student pass this course?

z1
Forward Propagation:
X1 • Calculate weighted sum of
inputs at each neuron
z2 y
Predicted: 0.1 • Apply activation function to
4 3 0 the weighted sum
• Predict output
X2

z3

© University of Juba 2024


Example Problem
Will a student pass this course?

z1
Forward Propagation:
X1 • Calculate weighted sum of
Predicted: 0.1 inputs at each neuron
z2 y Actual: 0 • Apply activation function to
4 3 0 the weighted sum
• Predict output
X2

z3

© University of Juba 2024


Example Problem
Will a student pass this course?

A loss function, also known as a cost function or objective function, is a crucial component of neural networks. It quantifies the "error"
between the network's predicted output and the true target output. The goal of training a neural network is to minimize this loss function.

z1
Forward Propagation:
X1 • Calculate weighted sum of
Predicted: 0.1 inputs at each neuron
z2 y Actual: 0 • Apply activation function to
4 3 0 the weighted sum
• Predict output
X2
• Calculate loss (difference
between predicted and
z3
actual values)

Loss( f( x (i)
: W), y (i)
)
Actual Predicted

© University of Juba 2024


Example Problem
Will a student pass this course?

The empirical loss of the network measures the total lost over the entire dataset.

z1

X1 X2 PREDICTED TARGET

4 3 X1 0.1 0

2 1 y 0.8 0
z2
… … … …

4 2 X2 0.9 0

21 6 0.7 1
z3

n
1
J(W) = n i=1
Loss( f( x (i)
: W), y (i)
)
Actual Predicted

© University of Juba 2024


Example Problem
Will a student pass this course?

Binary Cross Entropy:

A binary cross-entropy loss is a specialized loss function used in neural networks for binary classification tasks. It measures the dissimilarity
between the predicted probability distribution and the true probability distribution for binary outcomes (e.g., 0 or 1).

z1

X1 X2 PREDICTED TARGET

4 3 X1 0.1 0

2 1 y 0.8 0
z2
… … … …

4 2 X2 0.9 0

21 6 0.7 1
z3

n
1
J(W) = - n i=1
y (i)
log(f(x ; W)) + (1 - y ) log(1-f(x ; W))
(i) (i) (i)

Actual Predicted

© University of Juba 2024


Example Problem
Will a student pass this course?

Mean Squared Error (MSE) is a commonly used loss function in machine learning and neural networks. It measures the average squared
difference between the predicted values and the actual values. MSE is often used to evaluate the performance of regression models,
where the goal is to predict a continuous numerical value.

z1

X1 X2 PREDICTED TARGET

4 3 X1 30 40

2 1 y 78 70
z2
… … … …

4 2 X2 89 90

21 6 55 60
z3

n
1
J(W) =
2
n i=1
(y -
(i)
f(x ; W))
(i)

Actual Predicted

© University of Juba 2024


Summary

Synapses
W1
X1

W2

y
X2 n
XiWi
i=1
Wn
Xn Outputs
Inputs Hidden

Biological Neural Model Arti cial Neuron Deep Neural Networks


(Perceptron)

© University of Juba 2024


fi
Thank You
Dr. Felix Gonda
Assistant Prof. Computer Science
E-mail: [email protected]

© University of Juba 2024

https://uojai.github.io/deeplearning

You might also like