Lecture 3: Linear Neural Network and Linear
Regression: Part 1
Md. Shahriar Hussain
ECE Department, NSU
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain
Linear Neural Networks (LNNs)
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 2
Linear Neural Networks (LNNs)
• The neuron aggregates the
weighted input data.
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 3
Linear Neural Networks (LNNs)
• There can be two different types of Linear Neural Networks
– Regression Problem
– Classification Problem
Classification
Regression Problem
Problem
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 4
Linear Neural Networks (LNNs)
• For Regression,
– There will be only aggregation
– No activation function is needed.
North South University Source: Andrew NG Lectures CSE445 5
Linear Neural Networks (LNNs)
• For Regression Problem, we need to
– Cast Linear Regression Technique as a LNN model
• For Classification Problem, we need to
– Cast Logistic and Softmax Regression Technique as a LNN model
Regression Problem Classification
Problem
North South University Source: Andrew NG Lectures CSE445 6
What is Linear Regression
• Linear regression is defined as an algorithm that provides a linear
relationship between an independent variable and a dependent variable to
predict the outcome of future events
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 7
Linear Regression Example
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 8
Linear Regression Example
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 9
Linear Regression Example
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 10
Linear Regression Example
A Line of best
fit/Regression Line is
a straight line that
represents the best
approximation of a
scatter plot of data
points
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 11
Linear Regression Example
Estimated/Predicted value (𝑦/𝑦) Actual/True value (𝑦)/Ground Truth
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 12
Data Set Description
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 13
Data Set Description
x (1) = 2104
x (2) = 1416
y (1) = 460
(x, y)= One Training Example
(x (i), y (i))= ith Training example y (2) = 232
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 14
Hypothesis
Training Set
Learning Algorithm
Size of house h Estimated
New/unseen price
data (x) hypothesis 𝑦 𝑥
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 15
Hypothesis
• How do we represent h ?
𝑦 𝑥 =
𝜃0 and 𝜃1 : parameters/weights that will be
trained/determined by the ML model
Not hyperparameters Linear regression with one variable.
𝜃0 = intercept/bias/constant Univariate linear regression.
𝜃1 = slope/coefficient/gradient
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 16
Hypothesis
The goal is to choose Ө0 and Ө1 properly so that hӨ(x) is close to y.
• A cost function lets us figure out how to fit the best straight line to our data
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 17
Hypothesis
Size in feet2 (x) Price ($) in 1000's (y)
2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 18
Example
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 19
Cost Function
minimize
Ө0 Ө1
• We need to choose Ө0 and Ө1 in a way that the result of the function will be minimized for all
m training example. This equation is called cost function.
J(Ө0 , Ө1)=
minimize
Ө0 Ө1 J(Ө0 , Ө1)
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 20
Cost Function
Cost Function:
Goal:
• Here the cost function is called Squared Error cost function
• Minimize squared different between predicted house price and actual house
price
• 1/m - means we determine the average
• 1/2m the 2 makes the math a bit easier, and doesn't change the constants we
determine at all (i.e., half the smallest value is still the smallest value!)
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 21
Cost Function Calculation
• For simplifications, assumes θ0 = 0
Find best values of θ1 so that J(θ1) is minimum
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 22
Cost Function Calculation
3
3
2
2
1
1
0
0 -0.5 0 0.5 1 1.5 2 2.5
0 1 2 3
For, θ1 = 1
J(θ1) = 1/2*3 [0+0+0]=0
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 23
Cost Function Calculation
For, θ1 = 0.5
J(θ1) = ?
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 24
Cost Function Calculation
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 25
Cost Function Calculation
For, θ1 = 0
J(θ1) = ?
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 26
Cost Function Calculation
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 27
Cost Function Calculation
• If we compute a range of values plot
J(θ1) vs θ1 we get a polynomial
(looks like a quadratic)
• The optimization objective for the learning
algorithm is find the value of θ1 which
minimizes J(θ1)
So, here θ1 = 1 is the best value
for θ1
The line which has the least sum
of squares of errors is the best fit
line
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 28
Important Equations
Hypothesis:
Parameters:
Cost Function:
Goal:
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 29
Cost Function for two parameters
(for fixed , this is a function of x) (function of the parameters )
500
400
300
Price ($)
200
in 1000’s
100
0
0 500 1000 1500 2000 2500 3000
Size in feet2 (x)
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 30
Cost Function for two parameters
• Previously we plotted our cost function by plotting
– θ1 vs J(θ1)
• Now we have two parameters
– Plot becomes a bit more complicated
– Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 31
Cost Function for two parameters
• We can see that the height
(y) of the graph indicates the
value of the cost function,
• we need to find where y is at
a minimum
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 32
Cost Function for two parameters
• A contour plot is a graphical technique for representing a 3-dimensional surface by
plotting constant z slices, called contours, on a 2-dimensional format
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 33
Cost Function for two parameters
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 34
Cost Function for two parameters
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 35
Cost Function for two parameters
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 36
Cost Function for two parameters
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 37
Gradient descent
• We want to get min J(θ0, θ1)
• Gradient descent
– Used all over machine learning for minimization
• Outline:
• Start with some
• Keep changing to reduce until we hopefully
end up at a minimum
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 38
Gradient descent
Start with initial guesses
Start at 0,0 (or any other value)
Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
Each time you change the parameters, you select the gradient which
reduces J(θ0,θ1) the most possible
Repeat
Do so until you converge to a local minimum
Has an interesting property
Where you start can determine which minimum you end up
Here we can see one initialization point led to one local minimum
The other led to a different one
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 39
Gradient descent
• One initialization point led to one local minimum.
The other led to a different one
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 40
Gradient Descent Algorithm
• Gradient descent is used to minimize the MSE by
calculating the gradient of the cost function
Correct: Simultaneous update Incorrect:
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 41
Gradient Descent Algorithm
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 42
Gradient Descent Algorithm
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 43
Gradient Descent Algorithm
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 44
Gradient Descent Algorithm
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 45
Gradient Descent Algorithm
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 46
Gradient Descent Algorithm
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 47
Learning Rate
• Here, α is the learning rate, a hyperparameter
• It controls how big steps we made
• If α is small, we will take tiny steps
• If α is big, we have an aggressive gradient descent
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 48
Learning Rate
If α is too small, gradient descent
can be slow.
Higher training time
If α is too large, gradient descent
can overshoot the minimum. It may
fail to converge, or even diverge.
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 49
Learning Rate
If α is too small, gradient descent
can be slow.
Higher training time
If α is too large, gradient descent
can overshoot the minimum. It may
fail to converge, or even diverge.
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 50
Learning Rate
If α is too small, gradient descent
can be slow.
Higher training time
If α is too large, gradient descent
can overshoot the minimum. It may
fail to converge, or even diverge.
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 51
Local Minima
• Local minimum: value of the loss function is minimum at that point in a local
region.
• Global minima: value of the loss function is minimum globally across the
entire domain the loss function
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 52
Local Minima
at local minima
Global minima
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 53
Gradient Descent Calculation
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 54
Gradient Descent Calculation
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 55
• Reference:
– Andrew NG Lectures on Machine Learning, Standford University
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 56