Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views55 pages

W3 - Linear Regression

The document outlines a lecture on Linear Regression in a Machine Learning course, focusing on estimating motorcycle prices based on engine size. It discusses the process of collecting data, plotting it to find relationships, and utilizing optimization algorithms like Gradient Descent to determine the best predictor line. Additionally, it emphasizes the importance of considering multiple factors, such as brand, in price estimation.

Uploaded by

rimahmood2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views55 pages

W3 - Linear Regression

The document outlines a lecture on Linear Regression in a Machine Learning course, focusing on estimating motorcycle prices based on engine size. It discusses the process of collecting data, plotting it to find relationships, and utilizing optimization algorithms like Gradient Descent to determine the best predictor line. Additionally, it emphasizes the importance of considering multiple factors, such as brand, in price estimation.

Uploaded by

rimahmood2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Machine Learning (CS-245)

Spring – 2025

Dr. Mehwish Fatima


Assistant Professor,
AI & DS | SEECS | NUST, Islamabad
Overview of this week’s lecture

Linear Regression

- Estimating motorcycle price using linear regression

- Nomenclature of linear regression

- Optimization Algorithm

02/15
Let’s buy a motorcycle

03/15
Let’s buy a motorcycle

- Suppose you want to buy a new motorcycle, and you want to estimate
the price of the motorcycle using the engine size.

03/15
Let’s buy a motorcycle

- Suppose you want to buy a new motorcycle, and you want to estimate
the price of the motorcycle using the engine size.

- The exact relationship between the engine size and the price is unknown
to you.

𝑃𝑟𝑖𝑐𝑒 = 𝑓(𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒)

03/15
Let’s buy a motorcycle

- Suppose you want to buy a new motorcycle, and you want to estimate
the price of the motorcycle using the engine size.

- The exact relationship between the engine size and the price is unknown
to you.

𝑃𝑟𝑖𝑐𝑒 = 𝑓(𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒)

- So, you go to the market and take the price quotation for different
motorcycles (you collected some data).

03/15
Let’s buy a motorcycle

- Suppose you want to buy a new motorcycle, and you want to estimate
the price of the motorcycle using the engine size.
𝑦

- The exact relationship between the engine size and the price is unknown
500
450
to you. 400
𝑃𝑟𝑖𝑐𝑒 = 𝑓(𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒) 350

Price (x1000)
300
- So, you go to the market and take the price quotation for different 250
motorcycles (you collected some data). 200

150
- Heading back home, you plot the data on a graph and try to discover the
100
relationship between engine size and the price.
50

25 50 75 100 125 150 175 200 225 250 𝑥


Engine displacement (cc)

03/15
Let’s buy a motorcycle

- Suppose you want to buy a new motorcycle, and you want to estimate
the price of the motorcycle using the engine size.
𝑦

- The exact relationship between the engine size and the price is unknown
500
450
to you. 400
𝑃𝑟𝑖𝑐𝑒 = 𝑓(𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒) 350

Price (x1000)
300
- So, you go to the market and take the price quotation for different 250
motorcycles (you collected some data). 200

150
- Heading back home, you plot the data on a graph and try to discover the
100
relationship between engine size and the price.
50

- You notice that the price increases when engine size increases. 25 50 75 100 125 150 175 200 225 250 𝑥
Engine displacement (cc)
So, as a first estimation, you draw a line that fits your data.

𝑃𝑟𝑖𝑐𝑒 = 𝑤(𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒)

03/15
How can you determine the suitability of a predictor line for the given data?

04/15
How can you determine the suitability of a predictor line for the given data?

- Find the difference between actual function values and the predicted function values.

500

450

400

350

Price (x1000)
300

250

200

150

100

50

25 50 75 100 125 150 𝑥


175 200 225 250
Engine displacement (cc)

04/15
How can you determine the suitability of a predictor line for the given data?

- Find the difference between actual function values and the predicted function values.

500

450

400

350

Price (x1000)
300

- What’s the value of 𝑤 that minimises ℒ(𝑤)? 250

200

150

100

50

25 50 75 100 125 150 𝑥


175 200 225 250
Engine displacement (cc)

04/15
How can you determine the suitability of a predictor line for the given data?

- Find the difference between actual function values and the predicted function values.

500

450

What’s the value of 𝑤 that minimises ℒ(𝑤)?


400
-
20 350

Price (x1000)
18 300
- Evaluate ℒ(𝑥) on various values of 𝑤. 16
250
14
200
12

ℒ(𝑤)
150
10

8 100

6 50
4
25 50 75 100 125 150 175 200 225 250 𝑥
2
Engine displacement (cc)
0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5
𝑤
04/15
How can you determine the suitability of a predictor line for the given data?

- Find the difference between actual function values and the predicted function values.

500

450
- What’s the value of 𝑤 that minimises ℒ(𝑤)?
400

20 350
- Evaluate ℒ(𝑥) on various values of 𝑤.

Price (x1000)
18 300
16
- Why minimise ℒ(𝑤)? 14
250

200
12

ℒ(𝑤)
150
10

8 100

6 50

4
25 50 75 100 125 150 175 200 225 250 𝑥
2
Engine displacement (cc)
0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5
𝑤
04/15
How can you determine the suitability of a predictor line for the given data?

- Find the difference between actual function values and the predicted function values.

500

450

- What’s the value of 𝑤 that minimises 400

ℒ(𝑤)? 20 350

Price (x1000)
18 300
16
- Evaluate ℒ(𝑥) on various values of 𝑤. 250
14
200
12

ℒ(𝑤)
- Why minimise ℒ(𝑤)? 10
150

8 100

- Minimising ℒ(𝑤) means the 6 50

predictor function is closer to the 4


25 50 75 100 125 150 175 200 225 250
2 𝑥
actual function. Engine displacement (cc)
0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5
𝑤
04/15
How to find the optimal value of 𝑤?

- Use the graph (or table) that shows ℒ(𝑤) for all 𝑤 and pick the optimal
value.

- Tedious and time-consuming.


20

18

16

14

12

ℒ(𝑤)
10

0.25 0.5 0.75 1 1.25 1.5 1.75


2 2.25 2.5
𝑤

05/15
How to find the optimal value of 𝑤?

- Use the graph (or table) that shows ℒ(𝑤) for all 𝑤 and pick the optimal value.

- Tedious and time-consuming.


20
- Use an optimisation algorithm, for example, Gradient Descent.
18

- Start with a random 𝑤 in solution space (𝑤-ℒ(𝑤) plot). 16

14

- Calculate the gradient of ℒ(𝑤) with respect to 𝑤. 12

ℒ(𝑤)
10

0.25 0.5 0.75 1 1.25 1.5 1.75


2 2.25 2.5
𝑤

05/15
How to find the optimal value of 𝑤?

- Use the graph (or table) that shows ℒ(𝑤) for all 𝑤 and pick the optimal value.

- Tedious and time-consuming.


20
- Use an optimisation algorithm, for example, Gradient Descent.
18

- Start with a random 𝑤 in solution space (𝑤-ℒ(𝑤) plot). 16

14

- Calculate the gradient of ℒ(𝑤) with respect to 𝑤. 12

ℒ(𝑤)
10

- Take a step in the direction of the steepest descent (or ascent?). 8

6
- Calculate the gradient again on new location and taken another step. 4

0.25 0.5 0.75 1 1.25 1.5 1.75


2 2.25 2.5
𝑤

05/15
How to find the optimal value of 𝑤?

- Use the graph (or table) that shows ℒ(𝑤) for all 𝑤 and pick the optimal value.

- Tedious and time-consuming.


20
- Use an optimisation algorithm, for example, Gradient Descent.
18

- Start with a random 𝑤 in solution space (𝑤-ℒ(𝑤) plot). 16

14

- Calculate the gradient of ℒ(𝑤) with respect to 𝑤. 12

ℒ(𝑤)
10

- Take a step in the direction of the steepest descent (or ascent?). 8

6
- Calculate the gradient again on new location and taken another step. 4

2
- Continue until you reach optimal point.
0.25 0.5 0.75 1 1.25 1.5 1.75 2
2.25 2.5
𝑤

05/15
How accurately can you estimate motorcycle price by engine size only?

06/15
How accurately can you estimate motorcycle price by engine size only?

- There are many other factors besides engine size that determine the price of a motorcycle.

06/15
How accurately can you estimate motorcycle price by engine size only?

- There are many other factors besides engine size that determine the price of a motorcycle.

- Let’s also consider who manufactured the bike.

- Before, 𝑝𝑟𝑖𝑐𝑒 = 𝑤 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒

- Now, 𝑝𝑟𝑖𝑐𝑒 = 𝑤1 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒 + 𝑤2 × 𝑏𝑟𝑎𝑛𝑑

06/15
How accurately can you estimate motorcycle price by engine size only?

- There are many other factors besides engine size that determine the price of a motorcycle.

- Let’s also consider who manufactured the bike.

- Before, 𝑝𝑟𝑖𝑐𝑒 = 𝑤 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒

- Now, 𝑝𝑟𝑖𝑐𝑒 = 𝑤1 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒 + 𝑤2 × 𝑏𝑟𝑎𝑛𝑑

- Before,

06/15
How accurately can you estimate motorcycle price by engine size only?

- There are many other factors besides engine size that determine the price of a motorcycle.

- Let’s also consider who manufactured the bike.

- Before, 𝑝𝑟𝑖𝑐𝑒 = 𝑤 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒

- Now, 𝑝𝑟𝑖𝑐𝑒 = 𝑤1 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒 + 𝑤2 × 𝑏𝑟𝑎𝑛𝑑

- Before,

- Now,

06/15
Let’s establish some common terminology

- Data: (mostly) Real-world observations. May consist of input-output pairs or inputs only. In the case of
input- out pairs, the outputs are called Ground Truth or True Targets.

07/15
Let’s establish some common terminology

- Data: (mostly) Real-world observations. May consist of input-output pairs or inputs only. In the case of
input- out pairs, the outputs are called Ground Truth or True Targets.

- Features: Components of input. May be engineered manually or learnt automatically.

07/15
Let’s establish some common terminology

- Data: (mostly) Real-world observations. May consist of input-output pairs or inputs only. In the case of
input- out pairs, the outputs are called Ground Truth or True Targets.

- Features: Components of input. May be engineered manually or learnt automatically.

- Hypothesis Function: An approximation of the actual function (mostly unknown or intractable) that
governs the relationship between input and output. Also called the predictor function or the model.

07/15
Let’s establish some common terminology

- Data: (mostly) Real-world observations. May consist of input-output pairs or inputs only. In the case of input-
out pairs, the outputs are called Ground Truth or True Targets.

- Features: Components of input. May be engineered manually or learnt automatically.

- Hypothesis Function: An approximation of the actual function (mostly unknown or intractable) that
governs the relationship between input and output. Also called the predictor function or the model.

- Parameters: Coefficients of predictor function. Learnt through optimisation and operate on the input features.
Consist of weights and biases.

07/15
Let’s establish some common terminology

- Data: (mostly) Real-world observations. May consist of input-output pairs or inputs only. In the case of input-
out pairs, the outputs are called Ground Truth or True Targets.

- Features: Components of input. May be engineered manually or learnt automatically.

- Hypothesis Function: An approximation of the actual function (mostly unknown or intractable) that
governs the relationship between input and output. Also called the predictor function or the model.

- Parameters: Coefficients of predictor function. Learnt through optimisation and operate on the input features.
Consist of weights and biases.

- Prediction: Output of the predictor function given by applying weights on the features.

07/15
Let’s establish some common terminology

- Data: (mostly) Real-world observations. May consist of input-output pairs or inputs only. In the case of input-
out pairs, the outputs are called Ground Truth or True Targets.

- Features: Components of input. May be engineered manually or learnt automatically.

- Hypothesis Function: An approximation of the actual function (mostly unknown or intractable) that
governs the relationship between input and output. Also called the predictor function or the model.

- Parameters: Coefficients of predictor function. Learnt through optimisation and operate on the input features.
Consist of weights and biases.

- Prediction: Output of the predictor function given by applying weights on the features.

- Loss: The difference between prediction and ground truth. Also known as residual.

07/15
Let’s establish some common terminology (continued)

- Loss Function: A function used to calculate the loss. Many variants are available. Some are preferred
on others depending upon tasks.

08/15
Let’s establish some common terminology (continued)

- Loss Function: A function used to calculate the loss. Many variants are available. Some are preferred
on others depending upon tasks.

- Cost Function: Average of all losses over all input samples. Represents how good or bad a
predictor function (model) is, given some training data. Also called objective function.

08/15
Let’s establish some common terminology (continued)

- Loss Function: A function used to calculate the loss. Many variants are available. Some are preferred
on others depending upon tasks.

- Cost Function: Average of all losses over all input samples. Represents how good or bad a
predictor function (model) is, given some training data. Also called objective function.

- Optimisation: The process of finding optimal values of parameters (weights) that result in the minimum value
of cost function. (A comment about fairness of model). Mathematically,

min 𝐽(𝜃)
𝜃

08/15
Let’s establish some common terminology (continued)

- Loss Function: A function used to calculate the loss. Many variants are available. Some are preferred
on others depending upon tasks.

- Cost Function: Average of all losses over all input samples. Represents how good or bad a
predictor function (model) is, given some training data. Also called objective function.

- Optimization: The process of finding optimal values of parameters (weights) that result in the minimum value
of cost function. (A comment about fairness of model). Mathematically,

min 𝐽(𝜃)
𝜃

- Gradient Descent: An optimisation algorithm that uses gradient of cost function with respect to parameters
to update and eventually find the best combination of parameters to minimise cost function.

08/15
There are different types of loss functions

- 𝐿1-Loss: Also called absolute error loss.

𝐿1 − 𝐿𝑜𝑠𝑠 = 𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑

- Resistant to outliers, therefore robust.


- The plot of 𝐿1- loss is convex but has a discontinuity, difficult to
optimise.

09/15
There are different types of loss functions

- 𝐿1-Loss: Also called absolute error loss.

𝐿1 − 𝐿𝑜𝑠𝑠 = 𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑

- Resistant to outliers, therefore robust.


- The plot of 𝐿1- loss is convex but has a discontinuity, difficult to
optimise.

- L2-Loss: Also called squared error loss.


2
𝐿2 − 𝐿𝑜𝑠𝑠 = 𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑

- 𝐿2 loss is more sensitive to outliers because the differences are squared.


The greater the blunder, the higher the penalty.
- The plot is convex and easy to optimise.

09/15
There are different types of loss functions

- 𝐿1-Loss: Also called absolute error loss.

𝐿1 − 𝐿𝑜𝑠𝑠 = 𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑

- Resistant to outliers, therefore robust.


- The plot of 𝐿1- loss is convex but has a discontinuity, difficult to
optimise.

- L2-Loss: Also called squared error loss.


2
𝐿2 − 𝐿𝑜𝑠𝑠 = 𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑

- 𝐿2 loss is more sensitive to outliers because the differences are squared.


The greater the blunder, the higher the penalty.
- The plot is convex and easy to optimise.

- Which of the two is better?

09/15
There are different types of loss functions

- 𝐿1-Loss: Also called absolute error loss.

𝐿1 − 𝐿𝑜𝑠𝑠 = 𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑

- Resistant to outliers, therefore robust.


- The plot of 𝐿1- loss is convex but has a discontinuity, difficult to
optimise.

- L2-Loss: Also called squared error loss.


2
𝐿2 − 𝐿𝑜𝑠𝑠 = 𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑

- 𝐿2 loss is more sensitive to outliers because the differences are squared.


The greater the blunder, the higher the penalty.
- The plot is convex and easy to optimise.

- Which of the two is better?

- Depends on the data and the task.


09/15
Learnable parameters in regression consist of weights and biases

- For each input 𝑥 of the size 𝑛𝑥;

- Learnable parameters are weights 𝑤 ∈ ℝ𝑛𝑥 and biases 𝑏 ∈ ℝ.

- The predicted output can be calculated as

10/15
Optimisation means learning optimal model parameters

- The objective of an optimisation algorithm is to find optimal parameters to minimize the cost function.

𝛼 represents learning rate. It controls how big (or small) a step should be taken after each iteration.

11/15
Optimisation means learning optimal model parameters

- The objective of an optimisation algorithm is to find optimal parameters to minimize the cost function.

𝛼 represents learning rate. It controls how big (or small) a step should be taken after each iteration.

- All learnable parameters can be represented by 𝜃,

11/15
Optimisation means learning optimal model parameters

- The objective of an optimisation algorithm is to find optimal parameters to minimize the cost function.

𝛼 represents learning rate. It controls how big (or small) a step should be taken after each iteration.

- All learnable parameters can be represented by 𝜃,

- Repeat the process until the algorithm converges.

11/15
There are different versions of gradient descent algorithm

- Classical gradient descent is an old, accurate, and time-tested


optimisation algorithm but it is very slow.

- Requires computation of gradients with respect to all datapoints.

12/15
There are different versions of gradient descent algorithm

- Classical gradient descent is an old, accurate, and time-tested


optimisation algorithm but it is very slow.

- Requires computation of gradients with respect to all datapoints.

- Stochastic Gradient Descent (SGD) is sloppy but fast.

- Can do more with the same resources.

- Updates are made based on a randomly selected datapoint.

12/15
There are different versions of gradient descent algorithm

- Classical gradient descent is an old, accurate, and time-tested


optimisation algorithm but it is very slow.

- Requires computation of gradients with respect to all datapoints.

- Stochastic Gradient Descent (SGD) is sloppy but fast.

- Can do more with the same resources.

- Updates are made based on a randomly selected datapoint.

- Minibatch SGD can help smooth the optimisation path of SGD and take
advantage of parallelisation on GPUs.

- A middle-ground between GD and SGD.

- Updates are made based on a randomly selection minibatch of


datapoints.
12/15
Every AI design paradigm consists of three pillars

- Modelling: Simplification of real-world problems yet mathematically correct.


- Simplification might require losing some information.
(Consider only engine size and ignore other features.)

Real-world

Modelling
𝑦
500

450

400 Simplified

Price (x1000)
350

300
Model with
250 random
200

150
parameters
100 𝑝𝑟𝑖𝑐𝑒 = 2000 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒
50

25 50 75 100 125 150 175 200 225 250 𝑥


Engine displacement (cc)
13/15
Every AI design paradigm consists of three pillars
𝑦
- Modelling: Simplification of real-world problems yet mathematically correct. 500

- Simplification might require losing some information. 450

400
Model
(Consider only engine size and ignore other features.)

Price (x1000)
350

300 without
250

200 optimal
150
parameters
100

50 𝑝𝑟𝑖𝑐𝑒 = 2000 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒

25 50 75 100 125 150 175 200 225 250 𝑥


- Prediction: Ask questions from the model and record the answers. Engine displacement (cc)

Prediction
- The model returns the estimated price of the motorcycle by
multiplying the slope with engine size.
𝑦
500

450

400 Simplified

Price (x1000)
350

300
Model with
250 random
200

150
parameters
100 𝑝𝑟𝑖𝑐𝑒 = 2000 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒
50

25 50 75 100 125 150 175 200 225 250 𝑥


Engine displacement (cc)
13/15
Every AI design paradigm consists of three pillars
𝑦
- Modelling: Simplification of real-world problems yet mathematically correct. 500

- Simplification might require losing some information. 450

400
Model
(Consider only engine size and ignore other features.)

Price (x1000)
350

300 without
250

200 optimal
- Prediction: Ask questions from the model and record the answers. 150
parameters
100
𝑝𝑟𝑖𝑐𝑒 = 2000 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒
- The model returns the estimated price of the motorcycle by multiplying 50

25 50 75 100 125 150 175 200 225 250 𝑥


the slope with engine size. Engine displacement (cc)

Learning
- Learning: Learn the values of the model parameter that fit the training + data
data more accurately. 𝑦
- Requires training data and some optimisation algorithm. 500

- (The model learns that the best value of 𝑤 is 1.17 instead of the initial 450

400 Model with


estimate, 𝑤 = 1.)

Price (x1000)
350

300
optimal
250 parameters
200

150

100 𝑝𝑟𝑖𝑐𝑒 = 2000 × 𝑒𝑛𝑔𝑖𝑛𝑒 𝑠𝑖𝑧𝑒


50

25 50 75 100 125 150 175 200 225 250 𝑥


Engine displacement (cc)
13/15
Machine Learning has its foundation in statistics

https://www.ibm.com/topics/logistic-
regression
14/15
Machine Learning has its foundation in statistics

Regression Analysis Predicting future outcome


based on past knowledge

https://www.ibm.com/topics/logistic-
regression
14/15
Machine Learning has its foundation in statistics

Regression Analysis Predicting future outcome


based on past knowledge

Outcome is Outcome is
Linear Regression continuous variable categorical variable Logistic Regression

https://www.ibm.com/topics/logistic-
regression
14/15
Machine Learning has its foundation in statistics

Regression Analysis Predicting future outcome


based on past knowledge

Outcome is Outcome is
Linear Regression continuous variable categorical variable Logistic Regression

Simple Linear Multiple Linear Binary Logistic Multinomial Logistic Ordinal Logistic
Regression Regression Regression Regression Regression

One input variable Multiple input variables Binary output More than two outputs More than two outputs
without order with order

https://www.ibm.com/topics/logistic-
regression
14/15
Summary of today’s lecture

- We discussed linear regression starting from single feature to multifeatured input for the prediction of
motorcycle prices.
𝒙 Model 𝑦∈ℝ

15/15
Summary of today’s lecture

- We discussed linear regression starting from single feature to multifeatured input for the
prediction of motorcycle prices.
𝒙 Model 𝑦∈ℝ
- Made a model and initialised it randomly.
[Modelling]
- Had the model predict the price of a known motorcycle engine size. [Prediction]

- Calculated the difference between the predicted price and the actual price to calculate
the loss.
[Learning]
- Used an optimisation algorithm to update the model (weights) to minimise the difference.

15/15
Summary of today’s lecture

- We discussed linear regression starting from single feature to multifeatured input


for prediction
the of motorcycle prices.
𝒙 Model 𝑦∈ℝ
- Made a model and initialised it randomly.
[Modelling]
- Had the model predict the price of a known motorcycle engine size. [Prediction]

- Calculated the difference between the predicted price and the actual price to calculate
the loss.
[Learning]
- Used an optimisation algorithm to update the model (weights) to minimise the difference.

- Next Lecture:

- Binary classification: Spam filter, Anomaly 𝒙 𝑦 ∈ {1, 0}


Model
detection, etc.

15/15
Do you have any problem?

Some material (images, tables, text etc.) in this


presentation has been borrowed from different
books, lecture notes, and the web. The original
contents solely belong to their owners and are
used in this presentation only for clarifying
various educational concepts. Any copyright
infringement is not at all intended.

EOP

You might also like