Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views80 pages

4 LinReg

linear regression

Uploaded by

wuyuman6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views80 pages

4 LinReg

linear regression

Uploaded by

wuyuman6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Data Mining

Principal Component Analysis

Linear Regression

CS 584 :: Fall 2024


Ziwei Zhu
Department of Computer Science
George Mason University
Part of slides is from Dr. Theodora Chaspari.

1
• HW1 is due next Monday 09/23!

• For the PCA part, be careful about 𝑿 ∈ ℝ𝑁×𝐷 or


𝑿 ∈ ℝ𝐷×𝑁
• Will have the second quiz next week!

2
Outline

• Linear Regression definition


• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting

3
Example: Rent Price Prediction

Source: apartments.com

4
Example: Rent Price Prediction

Source: apartments.com

The price is a linear combination of features


RentPrice = w0 + w1 × Size + w2 × DistanceFromGMU + . . .
5
Example: Rent Price Prediction
RentPrice = w0 + w1 × Size

6
Example: Rent Price Prediction

More general, with more features

RentPrice = w0 + w1 × Size + w2 × DistanceFromGMU + . . .

With weights w0 , w1 , w2 ... corresponding to features

7
Example: Rent Price Prediction
RentPrice = w0 + w1 × Size + w2 × DistanceFromGMU + . . .

DistanceFromGMU

8
Linear Regression: Definition

9
Linear Regression: Definition

How to determine what is a good w?

10
Linear Regression: Evaluation
Minimizing the difference between predicted and actual
labels (i.e., prediction error).

11
Linear Regression: Objective Function
Minimizing the difference between predicted and actual
labels (i.e., prediction error).

Residual Sum of Squares (objective/loss function)

12
Linear Regression: Objective Function
Residual Sum of Squares (objective/loss function)

13
Linear Regression: Objective Function
objective/loss function:

Our goal is to find the solution w* to minimize


the objective/loss function:

14
Linear Regression: Objective Function
objective/loss function:

Our goal is to find the solution w* to minimize


the objective/loss function:

Next question: how to solve this optimization problem?

15
Outline

• Linear Regression definition


• Optimization: closed form solution via
ordinary least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting

16
Linear Regression: Optimization

17
Linear Regression: Optimization

convex non-convex

18
Linear Regression: Optimization

19
Linear Regression: Optimization

20
Linear Regression: Optimization
When the 1st order derivative is 0, we find the local
minimum (global minimum in the convex case)

21
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum

22
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum

23
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum

24
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum

25
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum

Ordinary Least Squares (OLS) 26


Computational Complexity

27
Computational Complexity

28
Outline

• Linear Regression definition


• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting

29
Gradient Descent

30
Gradient Descent

31
Gradient Descent

32
Gradient Descent

Global loss minimum

33
Gradient Descent

34
Gradient Descent

35
Gradient Descent

36
Gradient Descent

37
Gradient Descent

38
Gradient Descent: Algorithm Outline

39
Gradient Descent: Two Hyperparameters

40
Gradient Descent: Step Size/Learning Rate

41
Gradient Descent: Step Size/Learning Rate

42
Gradient Descent: Stopping Rule

43
Gradient Descent: Stopping Rule

44
Gradient Descent: Stopping Rule

• In practice, we can also directly set the number of


training epochs as the stopping rule, and tune it as a
hyper-parameter.
• Or, we evaluate the model performance on a
validation set after each epoch, and stop the training
process if the performance on validation does not
improve anymore.

45
Gradient Descent in Linear Regression

46
Gradient Descent in Linear Regression

47
Gradient Descent in Linear Regression

Intensive computation

48
Gradient Descent in Linear Regression

49
Gradient Descent in Linear Regression

50
Gradient Descent in Linear Regression

51
Gradient Descent in Linear Regression

52
Gradient Descent in Linear Regression

In practice, in each epoch, randomly


split the whole dataset into mini-
batches, iterate over all of them as
iterations

53
Gradient Descent in Linear Regression

54
Outline

• Linear Regression definition


• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting

55
Non-Linear Regression

56
Non-Linear Regression

57
Non-Linear Regression

58
Non-Linear Basis Function

OLS
OLS

59
Non-Linear Basis Function
𝜙 𝑥 = [1, 𝑥, 𝑥 2 , 𝑥 3 , … , 𝑥 𝑀 ]
𝜙 𝑥 = [1] 𝜙 𝑥 = [1, 𝑥]

[1, 𝑥, 𝑥 2 , 𝑥 3 ] [1, 𝑥, 𝑥 2 , 𝑥 3 , … , 𝑥 9 ]

60
Non-Linear Basis Function

underfitting underfitting

overfitting

61
Outline

• Linear Regression definition


• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting

62
Overfitting

63
Overfitting

:Cannot work well for unseen data

64
Overfitting

65
Overfitting

66
How to Avoid Overfitting?

67
How to Avoid Overfitting?

68
Overfitting

69
Overfitting

70
How to Avoid Overfitting?

71
How to Avoid Overfitting?

72
Regularization

73
Regularization

74
Regularization

75
Regularization

76
Regularization

𝜆=0 𝜆 = 𝑒 −18 𝜆=1

77
Regularization

𝜆=0 𝜆 = 𝑒 −18 𝜆=1

Tune 𝝀 as a hyper-parameter
78
Moreover …
The number of training steps is another important
factor influencing overfitting. Need to carefully
choose the stop condition

underfitting overfitting 79
What have we learned so far

• Non-linear basis, underfitting and overfitting (regularization)

80

You might also like