Data Mining
Principal Component Analysis
Linear Regression
CS 584 :: Fall 2024
Ziwei Zhu
Department of Computer Science
George Mason University
Part of slides is from Dr. Theodora Chaspari.
1
• HW1 is due next Monday 09/23!
• For the PCA part, be careful about 𝑿 ∈ ℝ𝑁×𝐷 or
𝑿 ∈ ℝ𝐷×𝑁
• Will have the second quiz next week!
2
Outline
• Linear Regression definition
• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting
3
Example: Rent Price Prediction
Source: apartments.com
4
Example: Rent Price Prediction
Source: apartments.com
The price is a linear combination of features
RentPrice = w0 + w1 × Size + w2 × DistanceFromGMU + . . .
5
Example: Rent Price Prediction
RentPrice = w0 + w1 × Size
6
Example: Rent Price Prediction
More general, with more features
RentPrice = w0 + w1 × Size + w2 × DistanceFromGMU + . . .
With weights w0 , w1 , w2 ... corresponding to features
7
Example: Rent Price Prediction
RentPrice = w0 + w1 × Size + w2 × DistanceFromGMU + . . .
DistanceFromGMU
8
Linear Regression: Definition
9
Linear Regression: Definition
How to determine what is a good w?
10
Linear Regression: Evaluation
Minimizing the difference between predicted and actual
labels (i.e., prediction error).
11
Linear Regression: Objective Function
Minimizing the difference between predicted and actual
labels (i.e., prediction error).
Residual Sum of Squares (objective/loss function)
12
Linear Regression: Objective Function
Residual Sum of Squares (objective/loss function)
13
Linear Regression: Objective Function
objective/loss function:
Our goal is to find the solution w* to minimize
the objective/loss function:
14
Linear Regression: Objective Function
objective/loss function:
Our goal is to find the solution w* to minimize
the objective/loss function:
Next question: how to solve this optimization problem?
15
Outline
• Linear Regression definition
• Optimization: closed form solution via
ordinary least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting
16
Linear Regression: Optimization
17
Linear Regression: Optimization
convex non-convex
18
Linear Regression: Optimization
19
Linear Regression: Optimization
20
Linear Regression: Optimization
When the 1st order derivative is 0, we find the local
minimum (global minimum in the convex case)
21
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum
22
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum
23
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum
24
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum
25
Linear Regression: Optimization
When the 1st order derivative is 0, we find the
global minimum
Ordinary Least Squares (OLS) 26
Computational Complexity
27
Computational Complexity
28
Outline
• Linear Regression definition
• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting
29
Gradient Descent
30
Gradient Descent
31
Gradient Descent
32
Gradient Descent
Global loss minimum
33
Gradient Descent
34
Gradient Descent
35
Gradient Descent
36
Gradient Descent
37
Gradient Descent
38
Gradient Descent: Algorithm Outline
39
Gradient Descent: Two Hyperparameters
40
Gradient Descent: Step Size/Learning Rate
41
Gradient Descent: Step Size/Learning Rate
42
Gradient Descent: Stopping Rule
43
Gradient Descent: Stopping Rule
44
Gradient Descent: Stopping Rule
• In practice, we can also directly set the number of
training epochs as the stopping rule, and tune it as a
hyper-parameter.
• Or, we evaluate the model performance on a
validation set after each epoch, and stop the training
process if the performance on validation does not
improve anymore.
45
Gradient Descent in Linear Regression
46
Gradient Descent in Linear Regression
47
Gradient Descent in Linear Regression
Intensive computation
48
Gradient Descent in Linear Regression
49
Gradient Descent in Linear Regression
50
Gradient Descent in Linear Regression
51
Gradient Descent in Linear Regression
52
Gradient Descent in Linear Regression
In practice, in each epoch, randomly
split the whole dataset into mini-
batches, iterate over all of them as
iterations
53
Gradient Descent in Linear Regression
54
Outline
• Linear Regression definition
• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting
55
Non-Linear Regression
56
Non-Linear Regression
57
Non-Linear Regression
58
Non-Linear Basis Function
OLS
OLS
59
Non-Linear Basis Function
𝜙 𝑥 = [1, 𝑥, 𝑥 2 , 𝑥 3 , … , 𝑥 𝑀 ]
𝜙 𝑥 = [1] 𝜙 𝑥 = [1, 𝑥]
[1, 𝑥, 𝑥 2 , 𝑥 3 ] [1, 𝑥, 𝑥 2 , 𝑥 3 , … , 𝑥 9 ]
60
Non-Linear Basis Function
underfitting underfitting
overfitting
61
Outline
• Linear Regression definition
• Optimization: closed form solution via ordinary
least squares
• Optimization: numerical solution via Gradient
Descent
• Non-linear basis function for regression
• Overfitting
62
Overfitting
63
Overfitting
:Cannot work well for unseen data
64
Overfitting
65
Overfitting
66
How to Avoid Overfitting?
67
How to Avoid Overfitting?
68
Overfitting
69
Overfitting
70
How to Avoid Overfitting?
71
How to Avoid Overfitting?
72
Regularization
73
Regularization
74
Regularization
75
Regularization
76
Regularization
𝜆=0 𝜆 = 𝑒 −18 𝜆=1
77
Regularization
𝜆=0 𝜆 = 𝑒 −18 𝜆=1
Tune 𝝀 as a hyper-parameter
78
Moreover …
The number of training steps is another important
factor influencing overfitting. Need to carefully
choose the stop condition
underfitting overfitting 79
What have we learned so far
• Non-linear basis, underfitting and overfitting (regularization)
80