Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views16 pages

SLChapter 5

This document discusses regularized regression techniques, specifically Ridge regression and Lasso, highlighting their properties, advantages, and disadvantages. It explains how these methods help reduce variance and improve model fitting by penalizing the complexity of linear regression models. Additionally, it provides examples and geometric interpretations of the constraints involved in these regularization methods.

Uploaded by

Sarp İLHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views16 pages

SLChapter 5

This document discusses regularized regression techniques, specifically Ridge regression and Lasso, highlighting their properties, advantages, and disadvantages. It explains how these methods help reduce variance and improve model fitting by penalizing the complexity of linear regression models. Additionally, it provides examples and geometric interpretations of the constraints involved in these regularization methods.

Uploaded by

Sarp İLHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Regularized

regression

Chapter 5
Regularized regression
EEE 485/585 Statistical Learning and Data Analytics Regularization

Ridge regression

Lasso

Cem Tekin
Bilkent University

Cannot be distributed outside this class without the permission of the


instructor. 5.1
Ely97 XXI Xy
Regularized
Regularization
MERICA my
regression

Properties of the least squares estimate:


When relation between Y and X = [X1 , . . . , Xp ]T is almost
linear, least squares estimate have low bias
But it can have high variance. Ex: when p ⇡datafinerthydotatin
n or p n
Shrinking regression coefficients results in better fit
Regularization
Reducing the complexity of linear regression
Ridge regression

Lasso

5.2
Regularized
Two methods for regularization regression

ie
Ordinary least squares:

0 12
n
X p
X
RSS( ) = @yi 0 j xij
A
i=1 j=1

Ridge regression: Regularization

IF
Ridge regression
0 12 Lasso
n
X p
X p
X
@yi A + 2
LossR ( , ) = 0 j xij j
i=1 j=1 j=1
p

y notpeered
X
2
= RSS( ) + j
j=1

Lasso:
so
evilbe 0 Ferdie1squareof porcretes
w X @y
Loss ( , ) =
Xn
x A +
X
| |
p
2
p

L i 0 j ij j
i=1 j=1 j=1
p
X
= RSS( ) +
j=1
| j|
pendite
absolute role of
parameters 5.3
I Regularized
Ridge regression regression

0 12
n p p
X X X
@yi A + 2
LossR ( , ) = 0 j xij |{z} j
i=1 j=1 tuning parameter j=1
| {z }
Regularization
penalty
Ridge regression
ˆ R = arg min LossR ( , ) Lasso

What happens when


!0
LostCHOI ROTC I ER I
É
!1
How to select ? CB not penalized
Use CV to select

5.4
Regularized
Example - Credit card balance prediction regression

Onotstudent
Y =card balance s
X =(income, limit, rating, student, ...)
Elo I student
R
Lines show estimated regression coefficients ˆ by ridge
regression.
Regularization

Ridge regression
400

Income

400
Lasso
Limit
Standardized Coefficients

Standardized Coefficients
300

300
Rating
Student
200

200
100

100
0

0
−100

−100
−300
−300

1e−02 1e+00 1e+02 1e+04 0.0 0.2 0.4 0.6 0.8 1.0

so2 ˆR 2 / ˆ 2
Figure from “An introduction to statistical learning" by James et al. 5.5
Regularized
Scale invariance regression

Least squares linear regression is scale invariant


Is ridge regression scale invariant?
Making ridge regression fair:
Figgis.ie and
income in the
Standardize the predictors:

1 apithfeatureofdata instance i Regularization


D leastsquarer I
g ITE Totie Text Is
Ridge regression
xij x̄j
x˜ij = q P Lasso
1 n
n i=1 (xij x̄j )2
Inefficient
Pn of inome
of
iaug value
1
where x̄j = n i=1 xij fester j
Properties of standardized predictors: resale inone i iincome in thetas th
i new 0.001 xi old
1
Pn
n i=1 x˜ij = 0 (zero mean)
1
Pn 2
n i=1 x˜ij = 1 (unit variance)
5 dataset with retched inone
centare the response YE t ÉYi
I Fi Jill ridge
It gets It500017 ist
Exit Exit least it tide
instead 5.6 since squareinerrant
Bias-variance tradeoff
d Regularized
in general
En
regression

A D s ridge
E g IITItg I
treetoooin
ER
60

60
ridge
Mean Squared Error

Mean Squared Error


50

50
underfit
40 weft

40
ridge it not sale invariant
iI
Regularization
30

30
Ridge regression
20

20
i
Lasso

n
10

10
I
0

0
1e−01 1e+01 1e+03 0.0 0.2 0.4 0.6 0.8 1.0
ˆR 2 / ˆ 2

t model iorplexyd
bias: black, variance: green, MSE: red

n
1X
MSE := (yi f̂ (x i ))2
n
i=1

Figure from “An introduction to statistical learning" by James et al. 5.7


Regularized
How to solve ridge regression?

titter
regression

n
X
0 oh p
X
12
p
X
FICy y'tITI
Y
Cy XEtCy XE t II
@yi A + 2
LossR ( , ) = 0 j xij j
i=1 j=1 j=1

ytyy'TE Iffy ITTXI ITI


R Regularization
ˆ = arg min LossR ( , )
Ridge regression

Lasso

DELORCAAt Q
Center the predictors and the response (centering makes
the intercept ˆ0R = 0)
Standardize the predictors
Q yTX yTX Pt I CAT
t II I I IT Q
g
Zytx It Ext E
I IF EE Ext sixty
5.8
n
Itai
Ip
Regularized
How to solve ridge regression?
ridge t s when
a
regression

Some2 notation:2y 3 and X centered

F Ey
3 2 3
y1 1 x11 x12 . . . x1p
6 y2 7 6 27 6x21 x22 . . . x2p 7
ET esta't YEEIE
6 7 6 7 6 7
y = 6 . 7, = 6 . 7, X = 6 . .. .. .. 7
4.5. .
4.5 4 .. . . . 5
yn xn1 xn2 . . . xnp
p
Linear algebra and matrix calculus gives:
Regularization
d
of 15 coefficients
Ridge regression
downsided version

ˆ R = (XT X + I) 1
g
XT y
Lasso

Hence given a new (centered and scaled) input x, (centered


R
prediction) ŷ = x T ˆ

5.9
Regularized
How to solve ridge regression? regression

Some2 notation:
3 2y 3 and X centered
2 3
y1 1 x11 x12 . . . x1p
6 y2 7 6 27 6x21 x22 . . . x2p 7
6 7 6 7 6 7
y = 6 . 7, = 6 . 7, X = 6 . .. .. .. 7
4.5. .
4.5 4 .. . . . 5
yn p xn1 xn2 . . . xnp Regularization
Linear algebra and matrix calculus gives: Ridge regression

Lasso

R
ˆ = (XT X + I) 1
XT y

Hence given a new (centered and scaled) input x, (centered


R
prediction) ŷ = x T ˆ
Compare with least squares solution:

ˆ RSS = (XT X) 1
XT y

5.9
Regularized
Advantage of ridge regression regression

Reduces variance
XT X + I, > 0 is invertable even when XT X is not
invertable.

Regularization

Ridge regression

Lasso

Figure from http://stats.stackexchange.com 5.10


Regularized
Disadvantage of ridge regression regression

Coefficients will be small but still almost all of them will be


nonzero

Regularization

Ridge regression

Lasso

5.11
Regularized
Lasso (least absolute shrinkage and selection operator) regression

0 12
n p p
X X X
LossL ( , ) = @yi 0 j xij
A + | j|
i=1 j=1 j=1
L Regularization
ˆ = arg min LossL ( , )
Ridge regression

Lasso

No closed form solution (in general)


crepe
What happens when
!0
ELI's
11 0
!1

5.12
Regularized
Example - Credit card balance prediction regression

Y =card balance
X =(income, limit, rating, student, ...)
L
Lines show estimated regression coefficients ˆ by lasso.
Lasso performs variable selection (results in a sparse model)
lasso ridge regression
Regularization
400

400
Ridge regression
Standardized Coefficients

Standardized Coefficients
Lasso
300

300
200

200
100

100
i
0

0
−100
Income

I Limit
−200

Rating

b is
Student

−300
20 50 100 200 500 2000 5000 0.0 0.2 0.4 0.6 0.8 1.0
ˆL 1 / ˆ
feature selection
1

Figure from “An introduction to statistical learning" by James et al. after CV 5.13
Regularized
Ridge regression and lasso as constrained minimization regression

problems
Ridge:

Ijf
8 0 12 9
>
<X n p >
= p
X X
@yi A 2
minimize 0 j xij subject to j s Regularization
>
: i=1 >
;
j=1 j=1 Ridge regression

fo pappies
Lasso

Lasso:

8
>
0
tf 12 9
>
geometry of constraint region
<X n Xp = Xp
different
minimize @yi 0 x
j ij
A subject to | j|  s
>
: i=1 >
;
j=1 j=1

forp 2 Hittites
For each s in the constrained minimization problem there
is a corresponding in the equivalent unconstrained p
minimization problem.

5.14
Regularized
Geometric interpretation
p2
regression

prssinceons

so f01
Regularization

an x op spied Ridge regression

Lasso

Red lines: error contours for RSS (same error for all
values on the same contour)
ˆ : least squares solution
2 2
Blue areas: region for which | 1| +| 2|  s or 1 + 2 s

Figure from “An introduction to statistical learning" by James et al. 5.15

You might also like