0% found this document useful (0 votes)

18 views61 pages

Ch3slides Multiple Linear Regression

This lecture discusses multiple linear regression. Multiple linear regression allows for modeling an outcome variable based on several predictor variables. The model finds coefficients for each predictor variable that minimize the sum of squared errors between predicted and actual outcomes. The coefficients are estimated using the least squares method. An example uses height and waist measurements to predict weight. Inference for the coefficients is also discussed.

Uploaded by

Daniel Angulo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views61 pages

Ch3slides Multiple Linear Regression

Uploaded by

Daniel Angulo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

San José State University

Math 261A: Regression Theory & Methods

Multiple Linear Regression

Dr. Guangliang Chen

This lecture is based on the following textbook sections:

• Chapter 3: 3.1 - 3.5, 3.8 - 3.10

Outline of this presentation:

• The multiple linear regression problem

• Least-square estimation

• Inference

• Some issues
Multiple Linear Regression

The multiple linear regression problem

Consider the body data again. To construct a more accurate model for
predicting the weight of an individual (y), we may want to add other
body measurements, such as head and waist circumferences, as additional
predictors besides height (x1 ), leading to multiple linear regression:
y = β0 + β1 x1 + β2 x2 + · · · + βk xk + (1)
where

• y: response, x1 , . . . , xk : predictors

• β0 , β1 , . . . , βk : coefficients

• : error term
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 3/61
Multiple Linear Regression

An example of a regression model with k = 2 predictors

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 4/61
Multiple Linear Regression

Remark. Some of the new predictors in the model could be powers of the
original ones

y = β0 + β1 x + β2 x2 + · · · + βk xk +

or interactions of them,

y = β0 + β1 x1 + β2 x2 + β12 x1 x2 +

or even a mixture of powers and interactions of them

y = β0 + β1 x1 + β2 x2 + β11 x21 + β22 x22 + β12 x1 x2 +

These are still linear models (in terms of the regression coefficients).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 5/61
Multiple Linear Regression

An example of a full quadratic model

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 6/61
Multiple Linear Regression

The sample version of (1) is

yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + i , 1≤i≤n (2)

where the i are assumed for now to be uncorrelated:

Cov(i , j ) = 0, i 6= j

and have the same mean zero and variance σ 2 :

E(i ) = 0, Var(i ) = σ 2 , for all i

(Like in simple linear regression, we will add the normality and independence
assumptions when we get to the inference part)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 7/61
Multiple Linear Regression

Letting
       
y1 1 x11 x12 · · · x1k β0 1
       
 y2  1 x21 x22 · · · x2k  β1   2 
y =  . , X = . , β =  . , =  . .
       
 ..   .. .. .. .. ..   ..   .. 
   . . . .      
yn 1 xn1 xn2 · · · xnk βk n

we can rewrite the sample regression model in matrix form

X · β + |{z}
y = |{z} (3)
|{z} |{z}
n×1 n×p p×1 n×1

where p = k + 1 represents the number of regression parameters (note

that k is the number of predictors in the model).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 8/61
Multiple Linear Regression

Least squares (LS) estimation

The LS criterion can still be used to

fit a multiple regression model

ŷ = β̂0 + β̂1 x1 + · · · + β̂k xk

to the data as follows:

n
X n
X
min S(β̂) = (yi − ŷi )2 = e2i
β̂ i=1 i=1

where for each 1 ≤ i ≤ n,

ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 9/61
Multiple Linear Regression

Let e = (ei ) ∈ Rn and ŷ = (ŷi ) = Xβ̂ ∈ Rn . Then e = y − ŷ.

Correspondingly the above problem becomes

min S(β̂) = kek2 = ky − Xβ̂k2

β̂

Theorem 0.1. If X0 X is nonsingular, then the LS estimator of β is

β̂ = (X0 X)−1 X0 y

Remark. The nonsingular condition holds true if and only if all the columns
of X are linearly independent (i.e. X is of full column rank).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 10/61
Multiple Linear Regression

Remark. This is the same formula for β̂ = (β̂0 , β̂1 )0 in simple linear
regression. To demonstrate it, consider the toy data set of 3 points:
(0, 1), (1, 0), (2, 2) used before. The new formula gives that

β̂ = (X0 X)−1 X0 y
  −1  
" # 1 0 " # 1
 1 1 1  1 1 1  
= 1 1 0

0 1 2 0 1 2
1 2 2
" #−1 " #
3 3 3
=
3 5 4
" #
0.5
=
0.5

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 11/61
Multiple Linear Regression

Proof. We first need to derive some formulas about the gradient of a

function of multiple variables:
∂ ∂
x0 a = a0 x = a

∂x ∂x
∂ ∂
2
x0 x = 2x

kxk =
∂x ∂x
∂
x0 Ax = 2Ax

∂x
∂ ∂
kBxk2 = x0 B0 Bx = 2B0 Bx

∂x ∂x

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 12/61
Multiple Linear Regression

Using the identity ku − vk2 = kuk2 + kvk2 − 2u0 v, we write

S(β̂) = kyk2 + kXβ̂k2 − 2(Xβ̂)0 y

= y0 y + β̂ 0 X0 Xβ̂ − 2β̂ 0 X0 y

Applying the formulas on the preceding slide, we obtain

∂S
= 0 + 2X0 Xβ̂ − 2X0 y
∂ β̂
Setting the gradient equal to zero

X0 Xβ̂ = X0 y ←− least squares normal equations

and solving for β̂ will complete the proof.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 13/61
Multiple Linear Regression

Remark. The very first normal equation in the system

X0 Xβ̂ = X0 y

is
X X X X
nβ̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik = yi
which simplifies to

β̂0 + β̂1 x̄1 + β̂2 x̄2 + · · · + β̂k x̄k = ȳ

This indicates that the centroid of the data, i.e., (x̄1 , . . . , x̄k , ȳ), is on the
least squares regression plane.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 14/61
Multiple Linear Regression

Remark. The fitted values of the least squares model are

ŷ = Xβ̂ = X(X0 X)−1 X0 y = Hy

| {z }
H

and the residuals are

e = y − ŷ = (I − H)y.

The matrix H ∈ Rn×n is called the hat matrix, satisfying

H0 = H (symmetric), H2 = H (idempotent), H(I − H) = O

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 15/61
Multiple Linear Regression

Geometrically, it is the orthogonal projection matrix onto the column space

of X (subspace spanned by the columns of X):

ŷ = Hy = X (X0 X)−1 X0 y ∈ Col(X)

| {z }
β̂

ŷ0 (y − ŷ) = (Hy)0 (I − H)y = y0 H(I − H) y = 0.

| {z }
=O

b y
e = (I − H)y

b
ŷ = Hy
b
0
Col(X)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 16/61
Multiple Linear Regression

Example 0.1 (body dimensions data1 ). Besides the predictor Height,

we include Waist Girth as a second predictor to preform multiple linear
regression for predicting Weight.

(R demonstration in class).

1
http://jse.amstat.org/v11n2/datasets.heinz.html

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 17/61
Multiple Linear Regression

Inference in multiple linear regression

• Model parameters: β = (β0 , β1 , . . . , βk )0 (intercept and slopes),
σ 2 (noise variance)

• Inference tasks (for the parameters above): point estimation, in-

terval estimation*, hypothesis testing*

• Inference of the mean response at x0 = (1, x01 , . . . , x0k )0 :

E(y | x0 ) = β0 + β1 x01 + · · · + βk x0k = x00 β

*To perform these two inference tasks, we will additionally assume that
the model errors i are normally and independently distributed with mean
iid
0 and variance σ 2 , i.e., 1 , . . . , n ∼ N(0, σ 2 ).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 18/61
Multiple Linear Regression

Expectation and variance of a vector-valued random variable

~ = (X1 , . . . , Xn )0 ∈ Rn be a vector-valued random variable. Define

Let X
~ = (E(X1 , . . . , E(Xn ))0
• Expectation: E(X)

• Variance (also called covariance matrix):

 
Var(X1 ) Cov(X1 , X2 ) · · · Cov(X1 , Xn )
 
 Cov(X2 , X1 ) Var(X2 ) ··· Cov(X2 , Xn )
~ =
Var(X) . . ..


.. .. .. 

 . . 

Cov(Xn , X1 ) Cov(Xn , X2 ) · · · Var(Xn )

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 19/61
Multiple Linear Regression

Point estimation in multiple linear regression

First, like in simple linear regression, the least squares estimator β̂ is an

unbiased linear estimator for β.
Theorem 0.2. Under the assumptions of multiple linear regression,

E(β̂) = β.

That is, β̂ is a (componentwise) unbiased estimator for β:

E(β̂i ) = βi , for all i = 0, 1, . . . , k

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 20/61
Multiple Linear Regression

Proof. We have

β̂ = (X0 X)−1 X0 y
= (X0 X)−1 X0 (Xβ + )
= (X0 X)−1 X0 · Xβ + (X0 X)−1 X0 ·
= β + (X0 X)−1 X0 .

It follows that
E(β̂) = β + (X0 X)−1 X0 E() = β
| {z }
=0

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 21/61
Multiple Linear Regression

Next, we derive the variance of β̂:

Var(β̂) = (Cov(β̂i , β̂j ))0≤i,j≤k .

Theorem 0.3. Let C = (X0 X)−1 = (Cij )0≤i,j≤k . Then

Var(β̂) = σ 2 C.

That is,

Var(β̂i ) = σ 2 Cii and Cov(β̂i , β̂j ) = σ 2 Cij .

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 22/61
Multiple Linear Regression

Proof. Using the formula:

Var(Ay) = A · Var(y) · A0 ,

we have

Var(β̂) = Var((X0 X)−1 X0 y)

| {z }
A
= (X0 X)−1 X0 · Var(y) · X(X0 X)−1
| {z } | {z } | {z }
A =σ 2 I A0

= σ 2 (X X)−1 .
0

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 23/61
Multiple Linear Regression

Lastly, we can derive an estimator of σ 2 from the residual sum of squares

X
SSRes = e2i = kek2 = ky − Xβ̂k2

Theorem 0.4. We have

E(SSRes ) = (n − p)σ 2 .

This implies that

SSRes
M SRes =
n−p
is an unbiased estimator of σ 2 .

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 24/61
Multiple Linear Regression

Remark. The total and regression sums of squares are defined in the same
way as before:
X X
SSR = (ŷi − ȳ)2 = ŷi2 − nȳ 2 = kŷk2 − nȳ 2
X X
SST = (yi − ȳ)2 = yi2 − nȳ 2 = kyk2 − nȳ 2

They can be used to assess the adequacy of the model through the
coefficient of determination
SSR SSRes
R2 = =1−
SST SST

The larger R2 (i.e., the smaller SSRes ), the better the model.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 25/61
Multiple Linear Regression

Example 0.2 (Weight ∼ Height +

Waist Girth). For this model,

M SRes = 4.5292 = 20.512

In contrast, for the simple linear re-

gression model (Weight ∼ Height),

M SRes = 9.3082 = 86.639.

Therefore, the multiple linear regres-

sion model has a smaller total fitting The coefficient of determination of this
error SSRes = (n − p)M SRes . model is R2 = 0.8853, which is much
higher than that of the smaller model.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 26/61
Multiple Linear Regression

Adjusted R2

R2 measures the goodness of fit of a single model and is not a fair criterion
for comparing models with different sizes k (e.g., nested models)

The adjusted R2 criterion is more b b b R2

suitable for such comparisons: b

u u
b u
u

2 SSRes /(n − p)
RAdj =1− b u

SST /(n − 1) u
2
u
RAdj
b

2 , the better the

The larger the RAdj
u

| | | | | | | |

model. k (#predictors)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 27/61
Multiple Linear Regression

Remark.

• As p (i.e., k) increases, SSRes will either decrease or stay the same:

– If SSRes does not change (or decreases by very little), then

2 will decrease. ←− The smaller model is better
RAdj
2
– If SSRes decreases relatively more than n − p does, then RAdj
would increase. ←− The larger model is better

• We can write instead

2 n−1
RAdj =1− (1 − R2 )
n−p
2 < R2 .
This implies that RAdj
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 28/61
Multiple Linear Regression

Summary: Point estimation in multiple linear regression

Model Point Properties
parameters estimators Bias Variance
β β̂ = (X0 X)−1 X0 y unbiased σ 2 (X0 X)−1
σ2 M SRes = SS Res
n−p unbiased

Remark. For the mean response at x0 = (1, x01 , . . . , x0k )0 :

E(y | x0 ) = β0 + β1 x01 + · · · + βk x0k = x00 β

an unbiased point estimator is

β̂0 + β̂1 x01 + · · · + β̂k x0k = x00 β̂

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 29/61
Multiple Linear Regression

Next
We consider the following inference tasks in multiple linear regression:

• Hypothesis testing

• Interval estimation

For both tasks, we need to additionally assume that the model errors i
are iid N (0, σ 2 ).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 30/61
Multiple Linear Regression

Hypothesis testing in multiple linear regression

Depending on how many regression coefficients are being tested together,

we have

• ANOVA F Tests for Significance of Regression on All Regression

Coefficients

• Partial F Tests on Subsets of Regression Coefficients

• Marginal t Tests on Individual Regression Coefficients

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 31/61
Multiple Linear Regression

ANOVA for Testing Significance of Regression

In multiple linear regression, the significance of regression test is

H0 : β1 = · · · = βk = 0
H1 : βj 6= 0 for at least one j

The ANOVA test works very similarly: The test statistic is

M SR SSR /k H0
F0 = = ∼ Fk,n−p
M SRes SSRes /(n − p)

and we reject H0 if
F0 > Fα,k,n−p

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 32/61
Multiple Linear Regression

Example 0.3 (Weight ∼ Height +

Waist Girth). For this multiple lin-
ear regression model, regression is
significant because the ANOVA F
statistic is
F0 = 1945
and the p-value is less than 2.2e-16.
Note that the p-values of the indi-
vidual coefficients can no longer be
used for conducting the significance
of regression test.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 33/61
Multiple Linear Regression

Marginal Tests on Individual Regression Coefficients

The hypothesis for testing the significance of any individual predictor xj ,
given all the other predictors, to the model is

H0 : βj = 0 vs H1 : βj 6= 0

If H0 is not rejected, then the regressor xj is insignificant and can be

deleted from the model (while preserving all other regressors).

To conduct the test, we need to use the point estimator β̂j (which is linear,
unbiased) and determine its distribution when H0 is true:

β̂j ∼ N (βj , σ 2 Cjj ), j = 0, 1, . . . , k

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 34/61
Multiple Linear Regression

The test statistic is

β̂j − 0 β̂j H0
t0 = =q ∼ tn−p (σ̂ 2 = M SRes )
se(β̂j ) σ̂ 2 C jj

and we reject H0 if
|t0 | > tα/2, n−p

Example 0.4 (Weight ∼ Height + Waist Girth). Based on the previous R

output, both predictors are significant when the other is already included
in the model:

• Height: t0 = 17.30, p-value < 2e-16

• Waist Girth: t0 = 40.36, p-value < 2e-16

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 35/61
Multiple Linear Regression

Partial F Tests on Subsets of Regression Coefficients

Consider the full regression model with k regressors

y = Xβ +

Suppose there is a partition of the regression coefficients in β into two

groups (the last r and the preceding ones):
 
β0  
" #   β k−r+1
β1  β1   . 
β= ∈ Rp , β1 =   ∈ Rp−r , β2 =  . 
 . ∈R
r
 
..
β2  . 
 
βk
βk−r

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 36/61
Multiple Linear Regression

We wish to test

H0 : β2 = 0 (βk−r+1 = · · · = βk = 0) vs H1 : β2 6= 0

to determine if the last r predictors may be deleted from the model.

Corresponding to the partition of β we partition X in a conformal way:

X = [X1 X2 ], X1 ∈ Rn×(p−r) , X2 ∈ Rn×r ,

such that
" #
β1
y = Xβ + = [X1 X2 ] = X1 β1 + X2 β2 +
β2

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 37/61
Multiple Linear Regression

We compare two contrasting models:

(Full model) y = Xβ +
(Reduced model) y = X1 β1 +

The corresponding regression sums of squares are

(df = k) SSR (β) = kXβ̂k2 − nȳ 2 , β̂ = (X0 X)−1 X0 y

(df = k − r) SSR (β1 ) = kX1 β̂1 k2 − nȳ 2 , β̂1 = (X01 X1 )−1 X01 y

Thus, the regression sum of squares due to β2 given that β1 is already in

the model, called extra sum of squares, is

(df = r) SSR (β2 | β1 ) = SSR (β) − SSR (β1 )

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 38/61
Multiple Linear Regression

Note that with the residual sums of squares

SSRes (β) = ky − Xβ̂k2 , β̂ = (X0 X)−1 X0 y
SSRes (β1 ) = ky − X1 β̂1 k2 , β̂1 = (X01 X1 )−1 X01 y
we also have
SSR (β2 | β1 ) = SSRes (β1 ) − SSRes (β)

Finally, the (partial F ) test statistic is

SSR (β2 | β1 )/r H0
F0 = ∼ Fr,n−p
SSRes (β)/(n − p)
and we reject H0 if
F0 > Fα,r,n−p
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 39/61
Multiple Linear Regression

Example 0.5 (Weight ∼ Height + Waist Girth). We use the extra sum of
squares method to compare it with the reduced model (Weight ∼ Height):

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 40/61
Multiple Linear Regression

Remark. The partial F test on a

single predictor xj , β = [β(j) ; βj ]
based on the extra sum of squares
SSR (βj | β(j) ) = SSR (β)−SSR (β(j) )

can be shown to be equivalent to

the marginal t test for βj .

For example, for Waist Girth,

• marginal t test: t0 = 40.36

• partial F test: F0 = 1629.2

Note that F0 = t20 (thus same test).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 41/61
Multiple Linear Regression

Remark. There is a decomposition

of the regression sum of squares
SSR ← SSR (β1 , . . . , βk | β0 )
into a sequence of marginal extra
sums of squares, each corresponding
to a single predictor:
From the above output:
SSR (β1 , . . . , βk | β0 )
– SSR (β1 | β0 ) = 46370, the predictor
= SSR (β1 | β0 )
height is significant
+ SSR (β2 | β1 , β0 ) – SSR (β2 | β1 , β0 ) = 33416, waist
+ ··· girth is significant given that height is
+ SSR (βk | βk−1 , . . . , β1 , β0 ) already in the model
– SSR (β1 , β2 | β0 ) = 79786
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 42/61
Multiple Linear Regression

Summary: hypothesis testing in regression

• ANOVA F test: H0 : β1 = · · · = βk = 0. Reject H0 if

M SR SSR /k
F0 = = > Fα,k,n−p
M SRes SSRes /(n − p)

• Marginal t-tests: H0 : βj = 0. Reject H0 if

β̂j − 0 β̂j
|t0 | > tα/2, n−p , t0 = =p
se(β̂j ) σ̂ 2 Cjj

• Partial F test: H0 : β2 = 0. Reject H0 if

SSR (β2 | β1 )/r
> Fα,r,n−p
SSRes (β)/(n − p)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 43/61
Multiple Linear Regression

Interval estimation in multiple linear regression

We construct the following

• Confidence intervals for individual regression coefficients β̂j

• Confidence interval for the mean response

• Prediction interval

under the additional assumption that the errors i are independently and
normally distributed with zero mean and constant variance σ 2 .

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 44/61
Multiple Linear Regression

Confidence intervals for individual regression coefficients

Theorem 0.5. Under the normality assumption, a 1 − α confidence interval
for the regression coefficient βj , 0 ≤ j ≤ k is
q
β̂j ± tα/2,n−p σ̂ 2 Cjj

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 45/61
Multiple Linear Regression

Confidence interval for the mean response

In the setting of multiple linear regression, the mean response at a given
point x0 = (1, x01 , . . . , x0k )0 is

E(y | x0 ) = x00 β = β0 + β1 x01 + · · · + βk x0k

A natural point estimator for E(y | x0 ) is the following:

ŷ0 = x00 β̂ = β̂0 + β̂1 x01 + · · · + β̂k x0k .

Furthermore, we can construct a confidence interval for E(y | x0 ).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 46/61
Multiple Linear Regression

Since ŷ0 is a linear combination of the responses, it is normally distributed

with
E(ŷ0 ) = x00 E(β̂) = x00 β
and
Var(ŷ0 ) = x00 Var(β̂)x0 = σ 2 x00 (X0 X)−1 x0

We can thus obtain the following result.

Theorem 0.6. Under the normality assumption on the model errors, a
1 − α confidence interval on the mean response E(y | x0 ) is
q
ŷ0 ± tα/2, n−p σ̂ 2 x00 (X0 X)−1 x0

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 47/61
Multiple Linear Regression

Prediction intervals for new observations

Given a new location x0 , we would like to form a prediction interval on
the future observation of the response at that location

y0 = x00 β + 0

where 0 ∼ N (0, σ 2 ) is the error.

We have the following result.

Theorem 0.7. Under the normality assumption on the model errors, a
1 − α prediction interval for the future observation y0 at the point x00 is
q
ŷ0 ± tα/2, n−p σ̂ 2 (1 + x00 (X0 X)−1 x0 )

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 48/61
Multiple Linear Regression

Proof. First, note that the mean of the response y0 at x0 , i.e., x00 β, is
estimated by ŷ0 = x00 β̂.

Let Ψ = y0 − ŷ0 be the difference between the true response and the point
estimator for its mean. Then Ψ (as a linear combination of y0 , y1 , . . . , yn )
is normally distributed with mean

Ψ = E(y0 ) − E(ŷ0 ) = x00 β − x00 β = 0

and variance

Var(Ψ) = Var(y0 ) + Var(ŷ0 ) = σ 2 + σ 2 x00 (X0 X)−1 x0

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 49/61
Multiple Linear Regression

It follows that
y0 − ŷ0
q ∼ N (0, 1)
σ 2 (1 + x00 (X0 X)−1 x0 )

and correspondingly,
y0 − ŷ0
q ∼ tn−p
M SRes (1 + x00 (X0 X)−1 x0 )

Accordingly, a 1 − α prediction interval on a future observation y0 at x0 is

q
ŷ0 ± tα/2, n−p M SRes (1 + x00 (X0 X)−1 x0 )

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 50/61
Multiple Linear Regression

Summary: interval estimation in regression

p
• βj (for each 0 ≤ j ≤ k): β̂j ± tα/2,n−p M SRes Cjj
!
(n−p)M SRes (n−p)M SRes
• σ2: χ2α ,n−p
, χ2
2 1− α ,n−p
2

q
• E(y | x0 ): ŷ0 ± tα/2, n−p M SRes x00 (X0 X)−1 x0
q
• y0 (at x0 ): ŷ0 ± tα/2, n−p M SRes (1 + x00 (X0 X)−1 x0 )

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 51/61
Multiple Linear Regression

Some issues in multiple linear regression

• Hidden extrapolation

• Units of measurements

• Multicollinearity

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 52/61
Multiple Linear Regression

Hidden extrapolation
In multiple linear regression, extrap-
olation may occur even when all pre-
dictor values are within their ranges.

We can use the hat matrix

H = X(X0 X)−1 X0
to detect hidden extrapolation: Let
hmax = max hii .
Then x0 is an extrapolation point if
x00 (X0 X)−1 x0 > hmax
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 53/61
Multiple Linear Regression

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 54/61
Multiple Linear Regression

Units of measurements
The choices of the units of the predictors in a linear model may cause their
regression coefficients to have very different magnitudes, e.g.,

y = 3 − 20x1 + 0.01x2

In order to directly compare regression coefficients, we need to scale the

regressors and the response to be on the same magnitude.

Two common scaling methods:

• Unit Normal Scaling

• Unit Length Scaling

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 55/61
Multiple Linear Regression

Unit Normal Scaling: For each regressor xj (and the response), rescale
the observations of xj (or y) to have zero mean and unit variance.

Let
1X 1 X 1 X
x̄j = xij , s2j = (xij − x̄j )2 , s2y = (yi − ȳ)2 .
n i n−1 i n−1 i
| {z } | {z }
Sjj =SST

Then the normalized predictors and response are

xij − x̄j yi − ȳ
zij = , yi∗ =
sj sy

This leads to a linear regression model without intercept: y∗ = Zb̂.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 56/61
Multiple Linear Regression

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 57/61
Multiple Linear Regression

Unit Length Scaling: For each regressor xj (and the response), rescale
the observations of xj (or y) to have zero mean and unit length.

xij − x̄j zij yi − ȳ y∗

wij = p =√ , yi0 = √ =√ i
Sjj n−1 SST n−1

This also leads to a linear regression model without intercept: y0 = Wb̂.

Remark.

• W= √1 Z and y0 = √ 1 y∗ . Thus, the two scaling methods

n−1 n−1
will yield the same standardized regression coefficients b̂.

• Entries of W0 W are correlations between the regressors.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 58/61
Multiple Linear Regression

Proof: We examine the (j, `)-entry of W0 W :

n
(W0 W)j` =
X
wij wi`
i=1
X xij − x̄j xi` − x̄`
= p √
Sjj S``
(xij − x̄j )(xi` − x̄` )
P
= p √
Sjj S``
1
(xij − x̄j )(xi` − x̄` )
P
n−1
=q q
1 P 2 1 P
n−1 (x ij − x̄ j ) n−1 (xi` − x̄` )2
= Corr(xj , x` )

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 59/61
Multiple Linear Regression

Multicollinearity
A serious issue in multiple linear regression is multicolinearity, or near-linear
dependence among the regression variables, e.g., x3 ≈ 2x1 + 5x2 .

• X won’t be of full rank, leading to a singular X0 X.

• The redundant predictors contribute no new information about the

response .

• The estimated slopes in the regression model will be arbitrary.

We will discuss in more detail how to diagnose (and fix) the issue of
multicollinearity in Chapter 9.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 60/61
Multiple Linear Regression

Further learning
3.3.3 The Case of Orthogonal Columns in X

3.3.4 Testing the General Linear Hypothesis H0 : Tβ = 0

• Projection matrices

– Concepts

– Computing via SVD

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 61/61

Lesson 11 Multiple Linear Regression
No ratings yet
Lesson 11 Multiple Linear Regression
35 pages
MAF3821 2024 Part1
100% (1)
MAF3821 2024 Part1
35 pages
Robins e
No ratings yet
Robins e
70 pages
Written Assessment For Grade 10 Mathematics
No ratings yet
Written Assessment For Grade 10 Mathematics
9 pages
Applied Statistics: Regression Guide
No ratings yet
Applied Statistics: Regression Guide
6 pages
Derivation of The Ordinary Least Squares Estimator Multiple Regression Case
100% (1)
Derivation of The Ordinary Least Squares Estimator Multiple Regression Case
10 pages
Multiple Regression SPECIALISTICA
No ratings yet
Multiple Regression SPECIALISTICA
93 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
19 pages
Multiple Linear Regression Explained
No ratings yet
Multiple Linear Regression Explained
41 pages
Module 6
No ratings yet
Module 6
8 pages
Multiple Regression Okk PDF
No ratings yet
Multiple Regression Okk PDF
19 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Multivariate Linear Regression: Nathaniel E. Helwig
No ratings yet
Multivariate Linear Regression: Nathaniel E. Helwig
84 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
Chapter 5
No ratings yet
Chapter 5
116 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
26 pages
Advanced Linear Regression Guide
No ratings yet
Advanced Linear Regression Guide
7 pages
Multiple Linear Regression Model: (Or Equivalently
No ratings yet
Multiple Linear Regression Model: (Or Equivalently
41 pages
Confidence Interval Practice Exam
No ratings yet
Confidence Interval Practice Exam
12 pages
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
No ratings yet
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
7 pages
Topic3 Multiple Regression
No ratings yet
Topic3 Multiple Regression
12 pages
3.1 Multiple Linear Regression Model
No ratings yet
3.1 Multiple Linear Regression Model
11 pages
Econometrics Chapter Three
No ratings yet
Econometrics Chapter Three
35 pages
Multiple Linear Regression Model: (Or Equivalently
No ratings yet
Multiple Linear Regression Model: (Or Equivalently
41 pages
Pratima Education® 9898168041: D. Ratio
No ratings yet
Pratima Education® 9898168041: D. Ratio
68 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
Week 5 Lecture Q A
No ratings yet
Week 5 Lecture Q A
14 pages
2b Multiple Linear Regression
No ratings yet
2b Multiple Linear Regression
14 pages
CH - 4 - Econometrics UG
No ratings yet
CH - 4 - Econometrics UG
33 pages
Standard Normal Distribution Table
No ratings yet
Standard Normal Distribution Table
19 pages
Advanced Regression Techniques
No ratings yet
Advanced Regression Techniques
109 pages
Chapter Three Metrics (I)
No ratings yet
Chapter Three Metrics (I)
35 pages
Regression 4
No ratings yet
Regression 4
12 pages
Mathemathical Economics
No ratings yet
Mathemathical Economics
60 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
GDP Forecasting Using Time Series Analysis
No ratings yet
GDP Forecasting Using Time Series Analysis
15 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
Forecasting With Excel
No ratings yet
Forecasting With Excel
20 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
13 pages
Lesson 10 Simple Linear Regression and Correlation
No ratings yet
Lesson 10 Simple Linear Regression and Correlation
70 pages
Statistics: 2.3 The Mann-Whitney U Test: Rosie Shier. 2004
No ratings yet
Statistics: 2.3 The Mann-Whitney U Test: Rosie Shier. 2004
3 pages
Intro to Multiple Linear Regression
No ratings yet
Intro to Multiple Linear Regression
15 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Simple Regression in Real Estate
No ratings yet
Simple Regression in Real Estate
22 pages
3.multiple Correlation & Regression
No ratings yet
3.multiple Correlation & Regression
24 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
77 pages
Spatial Panel Models: J. Paul Elhorst
No ratings yet
Spatial Panel Models: J. Paul Elhorst
21 pages
L2 Linear Regression
No ratings yet
L2 Linear Regression
61 pages
A Multivariate Outlier Detection Method
No ratings yet
A Multivariate Outlier Detection Method
5 pages
Sampling Methods and The Central Limit Theorem
No ratings yet
Sampling Methods and The Central Limit Theorem
20 pages
Unit 3 - Framework For Inference
100% (1)
Unit 3 - Framework For Inference
6 pages
Cross-Validation Strategies For Data With Temporal, Spatial, Hierarchical, or Phylogenetic Structure
No ratings yet
Cross-Validation Strategies For Data With Temporal, Spatial, Hierarchical, or Phylogenetic Structure
17 pages
ECON712 Lecture7
No ratings yet
ECON712 Lecture7
10 pages
Ch15S - Sampling For TOC & STOT 2020
No ratings yet
Ch15S - Sampling For TOC & STOT 2020
13 pages
Multiple Linear Regression Estimation
No ratings yet
Multiple Linear Regression Estimation
45 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
30 pages
Chapter3 Econometrics MultipleLinearRegressionModel
No ratings yet
Chapter3 Econometrics MultipleLinearRegressionModel
40 pages
09 Curve Fitting II
No ratings yet
09 Curve Fitting II
17 pages
EVSC 445 Week 11
No ratings yet
EVSC 445 Week 11
40 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
18CS72 - Module 5 Notes
No ratings yet
18CS72 - Module 5 Notes
53 pages
Multiple Regression Edit - Removed
No ratings yet
Multiple Regression Edit - Removed
14 pages
Social Status of Mother Sex of Offspring: Male Female
No ratings yet
Social Status of Mother Sex of Offspring: Male Female
2 pages
Factor Analysis
No ratings yet
Factor Analysis
4 pages
Chapter 2 T Test
No ratings yet
Chapter 2 T Test
42 pages
Apollo Case Questions
No ratings yet
Apollo Case Questions
2 pages
226 PS9 Problems
No ratings yet
226 PS9 Problems
7 pages
4 A Regression Main
No ratings yet
4 A Regression Main
38 pages
Quantitative Analysis Report Mean Differences Assignment 1
No ratings yet
Quantitative Analysis Report Mean Differences Assignment 1
16 pages
HRM0092 1
No ratings yet
HRM0092 1
5 pages
Data Collection and Conclusion
No ratings yet
Data Collection and Conclusion
4 pages
BP Centiles New Born
No ratings yet
BP Centiles New Born
2 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
17 pages
Module 3 Data Preparation
No ratings yet
Module 3 Data Preparation
33 pages
MLR Sample Problem Solving by Hand
No ratings yet
MLR Sample Problem Solving by Hand
3 pages
A - Step-By-Step - Guide - To - Exploratory - Factor - Analysi... - (6. - Step - 1 - Variables - To - Include)
No ratings yet
A - Step-By-Step - Guide - To - Exploratory - Factor - Analysi... - (6. - Step - 1 - Variables - To - Include)
3 pages
Suggested Solution To Assignment 2 (2025, Allow With or Without Continuity Correction)
No ratings yet
Suggested Solution To Assignment 2 (2025, Allow With or Without Continuity Correction)
6 pages
7th E-Lecture On Multiple Regression (06.04.2020)
No ratings yet
7th E-Lecture On Multiple Regression (06.04.2020)
10 pages
Chap1 Mulitple Linear Regression v2
No ratings yet
Chap1 Mulitple Linear Regression v2
46 pages
Bayesian Inference For Partially Identified Models Exploring The Limits of Limited Data 1st Edition Complete EPUB Ebook
100% (19)
Bayesian Inference For Partially Identified Models Exploring The Limits of Limited Data 1st Edition Complete EPUB Ebook
14 pages
Regression Analysis 6
No ratings yet
Regression Analysis 6
23 pages
MRM 02
No ratings yet
MRM 02
35 pages

Ch3slides Multiple Linear Regression

Uploaded by

Ch3slides Multiple Linear Regression

Uploaded by

San José State University

Math 261A: Regression Theory & Methods

Multiple Linear Regression

Dr. Guangliang Chen

• Chapter 3: 3.1 - 3.5, 3.8 - 3.10

Outline of this presentation:

• The multiple linear regression problem

The multiple linear regression problem

An example of a regression model with k = 2 predictors

or even a mixture of powers and interactions of them

y = β0 + β1 x1 + β2 x2 + β11 x21 + β22 x22 + β12 x1 x2 + 

An example of a full quadratic model

The sample version of (1) is

yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + i , 1≤i≤n (2)

where the i are assumed for now to be uncorrelated:

and have the same mean zero and variance σ 2 :

E(i ) = 0, Var(i ) = σ 2 , for all i

we can rewrite the sample regression model in matrix form

where p = k + 1 represents the number of regression parameters (note

Least squares (LS) estimation

The LS criterion can still be used to

ŷ = β̂0 + β̂1 x1 + · · · + β̂k xk

to the data as follows:

where for each 1 ≤ i ≤ n,

ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik

Let e = (ei ) ∈ Rn and ŷ = (ŷi ) = Xβ̂ ∈ Rn . Then e = y − ŷ.

min S(β̂) = kek2 = ky − Xβ̂k2

Theorem 0.1. If X0 X is nonsingular, then the LS estimator of β is

Proof. We first need to derive some formulas about the gradient of a

Using the identity ku − vk2 = kuk2 + kvk2 − 2u0 v, we write

S(β̂) = kyk2 + kXβ̂k2 − 2(Xβ̂)0 y

Applying the formulas on the preceding slide, we obtain

X0 Xβ̂ = X0 y ←− least squares normal equations

and solving for β̂ will complete the proof.

Remark. The very first normal equation in the system

β̂0 + β̂1 x̄1 + β̂2 x̄2 + · · · + β̂k x̄k = ȳ

Remark. The fitted values of the least squares model are

ŷ = Xβ̂ = X(X0 X)−1 X0 y = Hy

and the residuals are

The matrix H ∈ Rn×n is called the hat matrix, satisfying

H0 = H (symmetric), H2 = H (idempotent), H(I − H) = O

Geometrically, it is the orthogonal projection matrix onto the column space

ŷ = Hy = X (X0 X)−1 X0 y ∈ Col(X)

ŷ0 (y − ŷ) = (Hy)0 (I − H)y = y0 H(I − H) y = 0.

Example 0.1 (body dimensions data1 ). Besides the predictor Height,

Inference in multiple linear regression

• Inference tasks (for the parameters above): point estimation, in-

• Inference of the mean response at x0 = (1, x01 , . . . , x0k )0 :

Expectation and variance of a vector-valued random variable

~ = (X1 , . . . , Xn )0 ∈ Rn be a vector-valued random variable. Define

• Variance (also called covariance matrix):

Point estimation in multiple linear regression

First, like in simple linear regression, the least squares estimator β̂ is an

That is, β̂ is a (componentwise) unbiased estimator for β:

E(β̂i ) = βi , for all i = 0, 1, . . . , k

Next, we derive the variance of β̂:

Var(β̂) = (Cov(β̂i , β̂j ))0≤i,j≤k .

Theorem 0.3. Let C = (X0 X)−1 = (Cij )0≤i,j≤k . Then

Var(β̂i ) = σ 2 Cii and Cov(β̂i , β̂j ) = σ 2 Cij .

Proof. Using the formula:

Var(β̂) = Var((X0 X)−1 X0 y)

Lastly, we can derive an estimator of σ 2 from the residual sum of squares

Theorem 0.4. We have

This implies that

Example 0.2 (Weight ∼ Height +

M SRes = 4.5292 = 20.512

In contrast, for the simple linear re-

M SRes = 9.3082 = 86.639.

Therefore, the multiple linear regres-

The adjusted R2 criterion is more b b b R2

suitable for such comparisons: b

2 , the better the

• As p (i.e., k) increases, SSRes will either decrease or stay the same:

– If SSRes does not change (or decreases by very little), then

• We can write instead

Summary: Point estimation in multiple linear regression

y = β0 + β1 x1 + β2 x2 + β11 x21 + β22 x22 + β12 x1 x2 +

yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + i , 1≤i≤n (2)

where the i are assumed for now to be uncorrelated:

E(i ) = 0, Var(i ) = σ 2 , for all i

where 0 ∼ N (0, σ 2 ) is the error.