Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views20 pages

Day1 05 Introduction To DeepLearning Part

The document is an introduction to deep learning, covering topics such as the modeling of neurons, perceptrons, multi-layered perceptrons, and various types of artificial neural networks (ANN). It discusses the structure and function of neurons, activation functions, and training methods for neural networks. Additionally, it highlights popular frameworks and categories of neural networks used in applications like image recognition and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Day1 05 Introduction To DeepLearning Part

The document is an introduction to deep learning, covering topics such as the modeling of neurons, perceptrons, multi-layered perceptrons, and various types of artificial neural networks (ANN). It discusses the structure and function of neurons, activation functions, and training methods for neural networks. Additionally, it highlights popular frameworks and categories of neural networks used in applications like image recognition and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1/19/2025

Introduction to Deep Learning


2024

Ando Ki, Ph.D.


[email protected]

Table of contents
 Modeling a neuron  Artificial neuron: Perceptron
 Artificial neuron: activation functions
 Perceptron
 Artificial neural network: ANN
 How perceptron classifies hyperplane
 Fully connected feed-forward network: FC-FFN
 Perceptron: Boolean  Optional output layer: Softmax
 Perceptron: Boolean AND training  How to find a good or the best network: Loss/Cost
 Multi-layered perceptron  How to find a good or the best network: Total Lost
 How to minimize total loss by changing [W] and [b]
 Layer-wise organization
 Optimization algorithm: gradient descent
 Categories of ANN
 How to compute gradient
 Brief history of neural network  Neural network
 Popular frameworks  Popular types of neural network
 Deep neural net
 NN categories by applications

Copyright (c) by Ando Ki 2

1
1/19/2025

Modeling a neuron
 Neuron: 신경세포(神經細胞)
► Dendrite: 수상돌기(樹狀突起)
 input
► Axon: 축삭돌기(軸索突起)
 output
 Branches of axon
 Terminals of axon (axon tip)
⚫ synaptic knob
► Synapse: 연접
 junction between two nerve cells

 Human
► whole brain
 ~86 billion neurons (Giga, 109)
 ~100 trillion synapses (Tera, 1012)
► cerebral cortex: 대뇌피질
 19~23 billion neurons

https://www.quora.com/What-is-deep-learning
Copyright (c) by Ando Ki 3

Modeling a neuron
https://en.wikipedia.org/wiki/Activation_function
 Activation functions

sigmoid
sigmoid

Copyright (c) by Ando Ki 4

2
1/19/2025

Perceptron: single layer neural network


 Perceptron is a single artificial neuron that
bias
computes its weighted input and uses a b

threshold activation function.


► It is also called a TLU (threshold logic unit).
► It effectively separates the input space into
two categories by the hyperplane: W*X+b =
0

► Perceptron is a linear classifier.


 Cannot deal with non-linear cases
► Perceptron refers to a particular supervised
learning model with backpropagation
learning algorithm.
► Perceptron is an algorithm for supervised
learning of binary classifiers.

Copyright (c) by Ando Ki 5

How perceptron classifies hyperplane


t
X W t Y
Y=a Y=b

X1 W1 X2

t Y Y=b

X2 W2
Y=a
X1

W1*x1+W2*x2 + W0 = 0
➔ y = ax + b

X1 W1 X3

X2 W2 t Y

X3 W3
X2

X1
Need multi-layer perceptron
W1*x1+W2*x2 + W3*x3 + W0 = 0
➔ z = ax + by + c
Copyright (c) by Ando Ki 6

3
1/19/2025

Perceptron: Boolean

X1 W1=1 X1 W1=1 X1 W1=1

t=1.5 Y t=0.5 Y
X1 W1=-1 t=-0.5 Y t=1.5 Y

X2 W2=1 X2 W2=1 X2 W2=1

AND OR NOT XOR


Y X2
X2 X2

1 1 Y=1 Y=0
1 Y=0 Y=1 1 Y=1 Y=1

0 X1 0 X1
0 X1 0 X1
0 1 0 1
0 1 0 1

Copyright (c) by Ando Ki 7

Perceptron: Boolean AND training


 Step 1: initialize the weight and the  Training set [{inputs: expected}]
► T0={0,0:0}, T1={0,1:0}, T2={1,0:0}, T3={1,1:1}
threshold.
 for T0 and T1 and T2 (assume all weights are 0)
► Weights may be initialized to 0 or to a small ► y = 0x0+0x0 = 0
random value. ► e = 0-0 = 0 (no error)
► No update since no error
 Step 2: repeat until error is less than a
specific value  for T3
► y = 1x0+1x0=0
► Calculate output (for j-th test set) ► e = 1-0 = 1
► w0 = 0 + (1-0) = 1
► w1 = 0 + (1-0) = 1
 After updating
► Update weights (for i-th path for j-th test set) ► for T3, T2, and T1
(dj is desired or expected value)  y = 1x1+1x1=2 => apply threshold = 1.5
⚫ e = 1-1 = 0
 y = 1x1+1x0=1 => apply threshold = 1.5
► Calculate error ⚫ e = 0-0 = 0
 y = 1x0+1x1=1 => apply threshold = 1.5
⚫ e = 0-0 = 0
 y = 1x0+1x0=0 => apply threshold = 1.5
⚫ e = 0-0 = 0

Copyright (c) by Ando Ki 8

4
1/19/2025

Perceptron: Boolean OR training


 Training set [{inputs: expected}]  for T3
► T0={0,0:0}, T1={0,1:1}, T2={1,0:1}, T3={1,1:1} ► y = 1x1+1x1=2 ==> apply threshold = 1
 for T0 (assume all weights are 0) ► e = 1-1 = 0
► y = 0x0+0x0 = 0 ► No update since no error
► e = 0-0 = 0 (no error)
► No update since no error
 for T1
► y = 0x0+0x1=0
► e = 1-1 = 1
► w0 = 0 + (1-1) = 1
► w1 = 0 + (1-1) = 1
► Update w0 and w1
 After updating
► for T2
 y = 1x1+1x0=1 => apply threshold = 1
 e = 1-1 = 0
► No update since no error

Copyright (c) by Ando Ki 9

MLP: Multi-layered perceptron (다층 퍼셉트론)

Copyright (c) by Ando Ki 10

10

5
1/19/2025

Multi-layered perceptron
 Two-unit network (two layers)

X1 H3

O6 Y

X2 w24 H4

XOR
X2

1 Y=1 Y=0

0 X1
0 1

(from Pascal Vincent’s slides)


Copyright (c) by Ando Ki 11

11

Layer-wise organization
 3 types of layers  input layer: not counted for the number of
layers
► Input layer
 hidden layer
► hidden layer
 output layer
► output layer

 For the picture on the left


► assume fully connected
► 4-layered including 3-hidden layers
► 16 neurons: 5+4+5+2
► 65 weights: 3x5+5x4+4x5+5x2
 not including bias
► 16 biases: 5+4+5+2
► 82 learnable parameters: 65+16

 Modern neural network


input layer hidden layer output layer ► 10~20 layers, ~100 million parameters
input
feature
bias node neuron
output
neuron ► How about 125 layers?
(class)

fully-connected multi-layered neural network


Copyright (c) by Ando Ki 12

12

6
1/19/2025

Categories of ANN (Artificial Neural network)


 Fully-Connected NN
► feed forward
FNN
(Feed-Forward Neural Single-layer perceptron
► Multi-Layer Perceptron (MLP)
Network)
 Convolutional NN (CNN)
MLP
CNN ► feed forward, sparsely-connected
(Convolutional Neural
(Multi-layer perceptron)
Network) ► Image recognition
► AlphaGo
RNN LSTM
(Recurrent Neural (Long Short-Term
Network) Memory network)  Recurrent NN (RNN)
► feedback
Fully recurrent network Hopfield network
 Long Short-Term Memory (LSTM)
► feedback + storage
Others
Simple recurrent
network
Boltzmann machine ► Microsoft speech recognition
► Google neural machine translation (GNMT)

See neural network topology: http://www.asimovinstitute.org/neural-network-zoo/

Copyright (c) by Ando Ki 13

13

Popular Frameworks
 Popular Frameworks with supported
interfaces
► Caffe
 Berkeley / BVLC (Berkeley Artificial Intelligence
Research)
 C, C++, Python, Matlab
► TensorFlow
 Google Brain
 C++, Python
► PyTorch
► theano
 U. Montreal
 Python
► torch
 Facebook / NUU
 C, C++, Lua
► CNTK https://blogs.nvidia.com/blog/2016/01/12/accelerating-ai-artificial-intelligence-gpus/
 Microsoft
► MXNet
 Carnegie Mellon University / DMLC (Distributed
Machine Learning Community)
https://developer.nvidia.com/deep-learning-frameworks
Copyright (c) by Ando Ki 14

14

7
1/19/2025

Popularity

Deep Learning Framework Deep Learning Framework Power Scores (by Jeff Hale) http://bit.ly/2GBa3tU

https://towardsdatascience.com/deep-learning-framework-power-scores-2018-23607ddf297a
Copyright (c) by Ando Ki 15

15

Table of contents
 Artificial neuron: Perceptron  Neural network
 Artificial neuron: activation functions  Popular types of neural network
 Artificial neural network: ANN  Deep neural net
 Fully connected feed-forward network: FC-  NN categories by applications
FFN
 Optional output layer: Softmax  Popular DNNs and Frameworks
 How to find a good or the best network:
Loss/Cost
 How to find a good or the best network:
Total Lost
 How to minimize total loss by changing [W]
and [b]
 Optimization algorithm: gradient descent
 How to compute gradient

Copyright (c) by Ando Ki 16

16

8
1/19/2025

Artificial neuron: Perceptron


 Artificial Neuron: Perceptron 𝑊1
► inputs 𝑊2
𝑎1 , 𝑎2 , ⋯ , 𝑎𝐾 × +𝑏 =𝑧
► output ⋮
► weights 𝑊𝐾
► bias
► activation function

Copyright (c) by Ando Ki 17

17

Artificial neuron: activation functions


Logistic, soft step

y=max(x,0)

Copyright (c) by Ando Ki 18

18

9
1/19/2025

Artificial Neural Network: ANN


 Artificial Neural Network: ANN
► Network structure by different connections
► Each neuron can has different values of
weights and bias
► Weights and biases are network parameter
.

Neuron

Copyright (c) by Ando Ki 19

19

Artificial Neural Network: ANN


1 1 1 N: number of inputs
A: number of hidden layers

2 2 2

A B M

w1,1 w2,1 wA,1


w1,2 w2,2 wA,N
X1 X2 XN b1 b2 bA y1 y2 yA
N A A A
x b y w1,N w2,N wA,N
WA
N

w1,1 w2,1 wA,1 T X1 b1 y1


w1,2 w2,2 wA,N X2 b2 y2

w1,N w2,N wA,N XN bA yA

Copyright (c) by Ando Ki 20

20

10
1/19/2025

Fully connected feed-forward network: FC-FFN


 Activation function: E.g., Sigmoid – S-shaped function

x1=1 (1) (2) (3) y1

1 0 -2

x2=-1 (1) (-1) y2

0 0 2

1, -1
[1, -1] x + [1, 0] = [4, -2] sigmoid [0.98, 0.12]
-2, 1
x1 x2 y1 y2

[0.98, 0.12] x
2, -2
+ [0, 0] =[1.84, -2.08] sigmoid [0.86, 0.11] f ([1, -1]) = [0.62, -0.83]
-1, -1
f ([0, 0]) = [051, 0.85]
3, -1
[0.86, 0.11] x + [-2, 2] = [ ??, ??] sigmoid [ ??, ??]
-1, 4

Copyright (c) by Ando Ki 21

21

Do it yourself
 Calculate the output

0 (1) (2) (3)

1 0 -2

0 (1) (-1)

0 0 2

Copyright (c) by Ando Ki 22

22

11
1/19/2025

Optional output layer: Softmax


 Outputs of artificial neural network will be any values from very small to very
large including negative.
f ([1, -1]) = [0.62, -0.83] The output can be any value.
f ([0, 0]) = [051, 0.85] → Hard to interpret.

 Softmax for output layer


► Softmax is a function to transform a number of values to a range of value to between 0 ~ 1.
 Score (-inf, inf) ==> probabilities [0,1]
► Multinomial logistic or normalized exponential function

Where ‘m’ is max{z1,...,zk}

Copyright (c) by Ando Ki 23

23

Optional output layer: Softmax


 Softmax converts score to probability: Score (-inf, inf) ==> probabilities [0,1]
► un-normalized probabilities (summation will not give 1): for result j.
► normalized probabilities (summation will give 1): -- see below --
Softmax
x1 (1) (2) (3) ez
z1
1 0 -2

x2 (1) (-1) ez
z2
0 0 2

Copyright (c) by Ando Ki 24

24

12
1/19/2025

Probability and odds and logits


 Let take an example of binary classification
► Classes: C1, C2
► Probability of C1 for given x: y = P(C1|x)
► Probability of C2 for given x: 1 – y = P(C2|x)
► Define ‘odds’ = y/(1-y) = P(C1|x)/(1-P(C1|x))
► Define ‘logits’ = ln(odds) ➔ inverse of ‘sigmoid’.
 logit transforms value between [0:1] to a range of value to between –inf to +inf.

Copyright (c) by Ando Ki 25

25

Optional output layer: one-hot encoding and argmax


 One-hot encoding  Argmax is an operation that finds the
► One-hot encoding by encoding class labels argument that gives the maximum value
► Select one only among many. from a target function.

Softmax x1 0:0.02
x2 1:0.9
Input values

argmax 1

input xn 9:0.01
(index:out_value)

Copyright (c) by Ando Ki 26

26

13
1/19/2025

How to find a good or the best network: Loss/Cost


 Loss function is the distance between the network output and the target
► cost function or error function
► It indicates how good the result is.
► There can be different loss functions.
 The simplest one will be a summation of | t – y |.
⚫ Perfect match will give 0.
x1 y1 Indicates dog t1=0

Output values

Target values
x2 y2 Indicates cat t2=1
Input values Network
Parameter

input
(16x16 pixels) x256 y10 Indicates truck t10=0

loss = sum of distances


• Training error: error by training data set
• Generalization error (test error): error by test data set in order to evaluate the training model.
Copyright (c) by Ando Ki 27

27

How to find a good or the best network: Total Loss


 Total loss (L) is a sum of losses (lr)
► Make it as small as possible
 Training means to find the network parameter that minimize total loss L.
► This means we should modify the network parameter according to the total loss.

y1 t1 l1
total loss = sum of all loss
For all training data

y2 t2 l2

Sum of losses for R


test images
y3 t3 l3

yR tR lR

Copyright (c) by Ando Ki 28

28

14
1/19/2025

Cost functions (error function)


 Absolute error • y: inference value or calculated value
► Sum of absolute errors • t: target value
 sum(|t – y|)
► Mean absolute errors (MAE)
 sum(|t-y|)/n

error
 Squared error loss
► Sum of squared errors
 sum((t - y)**2)
► Mean squared errors (MSE)
 sum((t-y)**2)/n
► Root mean square errors (RMSE) t-y
 (MSE)**(1/2)
y
 Cross-entropy loss t y
► For classification after Softmax 1 1

► Sum of cross-entropy loss -log(y)


 -sum(t*log(y))
⚫ all except t=1 does not contribute
 or - sum[t*log(y) + (1-t)*log(1-y)]
⚫ add cost when t is not 1.
-log(y) emphasizes error (y) when softmax result (y) is small.
y<1 means error, y=1 means correct.
Copyright (c) by Ando Ki 29

29

Log plots
import numpy as np
from matplotlib import pyplot as plt

y = np.linspace(-1.5, 1.5, 400)

plt.plot(y, np.log(y), color='blue')


plt.text(0.3, -2, 'log(y)', fontsize=15, color='blue')

plt.plot(y, -np.log(y), color='black')


plt.text(0.2, 2, '-log(y)', fontsize=15, color='black')

plt.plot(y, -np.log(-y), color='red')


plt.text(-0.7, 2, "-log(-y)", fontsize=15, color='red')

plt.plot(y, -np.log(1-y), color='green')


plt.text(1.0, 2, "-log(1-y)", fontsize=15, color='green')

plt.grid()
plt.show()

Copyright (c) by Ando Ki 30

30

15
1/19/2025

How to minimize total loss by changing [W] and [b]


 If we can find how the network parameters affect the total loss, it may be possible to figure out
how to minimize the total loss.
 However, the number of parameters is too larger to figure out.
► AlexNet: 650K neurons, 8 layers, 60 Million parameters
 So we apply gradual iterative progress method in step by step called ‘Gradient descent’. It is
called optimization algorithm. L: total loss L: total loss
learning rate

<0
>0

W W
Wt=0 Wt=1 Wt=1 Wt=0
Wt=0 - Wt=0 -
► Negative slope ➔ increase W by some function of learning rate
► Positive slope ➔ decrease W
► Steep slope ➔ large change of W for the next time
► go on until the slope is small enough, i.e., inflection point

Copyright (c) by Ando Ki 31

31

Optimization algorithm: gradient descent


 Initial value problem
► different initial point leads to different minima
L: total loss

 Local minimum problem (get stuck in local minima)


► never guarantee global minima
 Learning rate problem
► large learning rate could cause oscillation
► small learning rate results in slow learning
W1
W2
 Vanishing gradient problem
► If a change in the parameter's value causes very small change in the network's output - the network
just can't learn the parameter effectively, which is a problem.
 Gradient Exploding

Copyright (c) by Ando Ki 32

32

16
1/19/2025

Popular types of Neural Network (NN)


 DNN: Deep NN
► More general model
► fully connected
► feed-forward (i.e., MLP: multilayer perceptron)
► speech, image processing, natural language processing (NLP)
 CNN: Convolutional NN
► Common image optimization
► connected locally (i.e., sparsely-connected)
► feed-forward
► object/facial recognition
 RNN: Recurrent NN
► context driven, time-series optimization
► variable connectivity
► feed-back in addition to feed-forward
► NLP and speech recognition
► Long Short-Term Memory (LSTM)
 feed-back + storage

Copyright (c) by Ando Ki 33

33

Deep neural net


 Any continuous function can be realized by
a network with one hidden layer with
sufficient neurons. (Universality theorem,
universal approximation theorem)
► A hidden layer network can represent any
continuous function
► A shallow fat neural net.
얍고 굵다(두껍다)

 Deep thin neural net (deep NN) is better


than shallow fat net.
► Using multiple layers of neurons to represent
some functions are much simpler.
 Less parameters ➔ less computation

깊고 가늘다(얇다)

Copyright (c) by Ando Ki 34

34

17
1/19/2025

Neural network in brief

Copyright (c) by Ando Ki 35

35

Neural network in brief

Copyright (c) by Ando Ki 36

36

18
1/19/2025

Neural network in brief

Copyright (c) by Ando Ki 37

37

Neural network in brief

Copyright (c) by Ando Ki 38

38

19
1/19/2025

Neural network in brief

Copyright (c) by Ando Ki 39

39

20

You might also like