Lecture 2
Lecture 2
LEARNING
AI
ML
DL
100,000,000,000
Neurons and over 100 trillion synapses.
Axon
(Passes messages away from Axon
the cell body to other Terminal Boutons
Cell Body neurons, muscles, or glands) (Form junctions with
(Soma) other cells)
Dendrites
(Receive messages
from other neurons/
cells)
Synapses
Action Potential
(Junctions where signals
(Electrical signal
are transmitted to other
travel down the
cells)
axon)
Axon Synapses
(Passes messages away from the cell body (Junctions where
to other neurons, muscles, or glands) signals are
transmitted to
other cells)
Cell Body
(Soma)
Dendrites
(Receive messages
from other
neurons/cells)
Action Potential
(Electrical signal travel
Axon
down the axon)
Terminal Boutons
(Form junctions with
other cells)
W1
X1
W2 n
XiWi g y
X2 i=1
Wn
Neuron Activation Output
(Cell body) Function
Weights (Axon)
Xn
(Synapses)
Input (Dendrites)
Bia
The bias ensures that
there is no zero output
in case the input and
weights are zero.
Inputs: X1, X2, X3 are the values that are fed into the neuron. They
can be individual features or a combination of features.
W0
Weights: w1, w2, w3 are assigned to each input, which represents
W1 its importance in determining the neuron's output.
X1 Bias: w0 is an additional term that can be added to the weighted
W2
z g y
sum of inputs. It allows the neuron to adjust its output without
changing the weights.
X2
Weighted sum: The weighted sum of the inputs is calculated by
Neuron Non-Linear Output multiplying each input by its corresponding weight and then
Wn Activation
adding the bias.
Function
Xn Activation function: g is a function takes the weighted sum as
Weights
input and produces the neuron's output.
Inputs
Artificial Neuron
(Mathematical Model of a Perceptron)
Bia
The bias ensures that
there is no zero output
in case the input and
weights are zero.
W0 n
W1 z = W0 + i=1
xi wi
X1
z =w
W2
z g y + x1 w1 + x2 w2 + … + xn wn
X2 0
Inputs
Artificial Neuron
(Mathematical Model of a Perceptron)
Bia
The bias ensures that
there is no zero output
in case the input and
weights are zero.
W0 n
W1 z = W0 + i=1
xi wi
X1
z =w
W2
z g y + x1 w1 + x2 w2 + … + xn wn
X2 0
Inputs
1.0
0.8
0.6
Input Output
0.4
0.2
-10 -5 0 5 10
Activation Function
An activation function play a crucial role in neural networks by introducing non-linearity into the
model. This non-linearity is essential for learning complex patterns and relationships in data.
Without activation functions, neural networks would essentially be linear models, limiting their
ability to solve complex problems.
-ve
-ve
Sigmoid Softmax
1 eyi
Formula: g(y) = Formula: g(yi) =
y
Σj(e j)
1+ e-z
Description: Outputs a value between 0 and 1. Description: Converts a vector of numbers into a probability
Advantages: Smooth and differentiable. distribution.
Disadvantages: Can suffer from the vanishing gradient Advantages: Ensures that the outputs sum to 1, representing
problem, especially for large negative inputs. probabilities.
Usage: Typically used in the output layer of binary Disadvantages: Can be computationally expensive for large input
classification models. vectors.
Usage: Used primarily in the output layer of classification models.
-ve -ve
-ve
What if we want to create a neural A linear activation function would Non-linearities allow us to
network to distinguish blue from red produce linear decision irrespective approximate arbitrary complex
points? of the network size. functions.
-1
Activation
Function
y =g ( z )
y =g ( 4 ) g=
y=4
Bias
W0
W1
X1
W2
X2 z y
Wn
Output
X3 Hidden
Layer
Layer
Input
Layer
Z1
X1
Z2 Y1
X2
Z3 Y2
X3
Z4
Input Output
Layer Layer
Hidden
Layer
(1)
W
(2)
z1 W
g( z1 )
X1
z2 g( z2 )
y1
X2 z3 g( z3 )
y2
Xm
g( zk )
zk
z3 = W 0,3 +
(1)
j=1
xj W (1)
j,3
= W 0,3 +
(1)
x1 W 1,3 +
(1)
x2 W
(1)
2,3 + xm W (1)
m,3
X1
za,2 zk,2 zp,2
y1
y2
Xm
nk-1
zk,i = (k)
W 0,i +
j=1
g(zk-1,j) W
(k)
j,i
Suppose we want to predict wether a student will pass this course or not. We can build a
model that take course features that determine a passing the class. For example, we can
start with two features.
X1 X2 TARGET
4 3 0
2 1 0
22 9 1 X2: No. Hours
4 2 0 spent on the
30 8 1 assignments.
2 2 0
12 6 1
20 10 1
Legend
17 8 1
24 7 1 Pass (1)
28 9 1 Fail (0)
21 6 1
X1: No. of lectures attended.
z1
X1
z2 y
Output
X2
Layer
z3
Input
Layer
Hidden
Layer
z1
Forward Propagation:
X1 • Calculate weighted sum of
inputs at each neuron
z2 y
Predicted: 0.1 • Apply activation function to
4 3 0 the weighted sum
• Predict output
X2
z3
z1
Forward Propagation:
X1 • Calculate weighted sum of
Predicted: 0.1 inputs at each neuron
z2 y Actual: 0 • Apply activation function to
4 3 0 the weighted sum
• Predict output
X2
z3
A loss function, also known as a cost function or objective function, is a crucial component of neural networks. It quantifies the "error"
between the network's predicted output and the true target output. The goal of training a neural network is to minimize this loss function.
z1
Forward Propagation:
X1 • Calculate weighted sum of
Predicted: 0.1 inputs at each neuron
z2 y Actual: 0 • Apply activation function to
4 3 0 the weighted sum
• Predict output
X2
• Calculate loss (difference
between predicted and
z3
actual values)
Loss( f( x (i)
: W), y (i)
)
Actual Predicted
The empirical loss of the network measures the total lost over the entire dataset.
z1
X1 X2 PREDICTED TARGET
4 3 X1 0.1 0
2 1 y 0.8 0
z2
… … … …
4 2 X2 0.9 0
21 6 0.7 1
z3
n
1
J(W) = n i=1
Loss( f( x (i)
: W), y (i)
)
Actual Predicted
A binary cross-entropy loss is a specialized loss function used in neural networks for binary classification tasks. It measures the dissimilarity
between the predicted probability distribution and the true probability distribution for binary outcomes (e.g., 0 or 1).
z1
X1 X2 PREDICTED TARGET
4 3 X1 0.1 0
2 1 y 0.8 0
z2
… … … …
4 2 X2 0.9 0
21 6 0.7 1
z3
n
1
J(W) = - n i=1
y (i)
log(f(x ; W)) + (1 - y ) log(1-f(x ; W))
(i) (i) (i)
Actual Predicted
Mean Squared Error (MSE) is a commonly used loss function in machine learning and neural networks. It measures the average squared
difference between the predicted values and the actual values. MSE is often used to evaluate the performance of regression models,
where the goal is to predict a continuous numerical value.
z1
X1 X2 PREDICTED TARGET
4 3 X1 30 40
2 1 y 78 70
z2
… … … …
4 2 X2 89 90
21 6 55 60
z3
n
1
J(W) =
2
n i=1
(y -
(i)
f(x ; W))
(i)
Actual Predicted
Synapses
W1
X1
W2
y
X2 n
XiWi
i=1
Wn
Xn Outputs
Inputs Hidden
https://uojai.github.io/deeplearning