The Simple
Linear
Regression
Model (Part 2)
1
Goodness of Fit
Given that the line represented by the
OLS estimates of the slope and intercept
are the “optimal choices” in that they
minimize SSR, we still need to compare
SSR across several models to get a better
goodness of fit.
2
Goodness of Fit
• 1. The sum of squared residuals represents
the degree to which our model missed the
data. A lower SSR means a “better” fit.
N N
SSR (Yi Yi ) u i
ˆ 2
ˆ 2
i 1 i 1
• However, the value of SSR depends on the
scale of the data, will not allow for consistent
comparison across equations 3
• 2. What represents the degree to which
our model succeeds?
– For each observation, we are trying to
explain the deviation from the mean of the
dependent variable.
– For the sample as a whole, we look at the
sum of squared deviations from the mean,
which is the Total Sum of Squares or SST.
N
SST (Yi Yi ) var(Yi )
2
i 1 4
• 3. The Explained Sum of Squares (SSE)
represents the deviation of the fitted values from
the mean.
N
SSE (Yˆi Yi ) 2
i 1
• A “perfect fit” would happen when SSE = SST
and the fitted Yi is the same as the observed Yi.
• Note that SST = SSE + SSR for OLS
5
• The COEFFICIENT OF
DETERMINATION or R2
the percentage of total
variation (SST) that is
explained by the model
(SSE)
– The ratio of the explained
variation compared to the
total variation
– measure of goodness of fit
6
Coefficient of Determination
N N
SSE
i i
(Yˆ Y ) 2
SSR
i i
(Y Yˆ ) 2
R
2
i 1
N
1 1 i 1
N
i i i i
SST SST
(Y Y ) 2
(Y Y ) 2
i 1 i 1
• SSE/SST means the fraction of the
sample variation in Y that is explained by
X.
• The closer the value to 1, the better fit our
data is
7
Venn Diagram of R2
“Variation in Yi”
A B (Yi Y ) 2 SST
B C ( X i X )2
A
B (Yi Y )( X i X )
B A (Yi Yˆ ) 2 SSE
B /( A B) R 2
C B /( B C ) ˆ
1
The greater B (the overlap),
“Variation in Xi” the better the fit
8
• 5. By definition, R2 will be
between zero and 1, simply
because SST will never be less
than SSR. SSE can be no greater
than SST
– An R2 = 1 indicates that all
observations lie exactly on the
regression line. OLS provides a
perfect fit to the data. This
never happens, and if you see
it, there is something wrong.
9
• A value of R2 that is nearly
equal to zero indicates a
poor fit of the OLS line:
very little of the variation in
Yi is captured by the
variation in the Yi_hat.
However, in some
Coefficient of instances the value of R2 =
Determination 0.07 is okay as long as
coefficients make sense.
• In panel and cross section
data, R2 is lower.
• In time series data, R2 is
higher.
10
Back to Our Example - SSR
i Yi Xi Yˆ Yi Yˆ (Yi Yˆ ) 2
1 1050 1100 880.8 169.2 28615.19
2 1900 2550 2195.1 -295.1 87102.18
3 1560 1700 1424.7 135.3 18310.33
4 2760 3400 2965.6 -205.6 42262.00
5 6500 7200 6409.9 90.1 8113.30
6 5000 5600 4959.7 40.3 1626.19
7 3400 3900 3418.8 -18.8 352.73
8 4000 4500 3962.6 37.4 1396.85
9 1200 1400 1152.8 47.2 2231.42
Mean 3041 190,010.19
SSR 11
Calculate TSS
(Y-Y_bar)2
i Yi Xi Y-Y_bar
3964523
1 1050 1100 -1991
1302135
2 1900 2550 -1141
2193690
3 1560 1700 -1481
79023
4 2760 3400 -281
11963912
5 6500 7200 3459
3837246
6 5000 5600 1959
128801
7 3400 3900 359
919468
8 4000 4500 959
3389690
9 1200 1400 -1841
Mean 3041 27,778,489 12
SST
Calculate ESS
(Y_hat - Y_bar)2
i Yi Xi Y_hat Y_hat - Y_bar
1 1050 1100 880.6 -2160.3 4666772.31
2 1900 2550 2194 -846.0 715682.72
3 1560 1700 1424 -1616.4 2612835.57
4 2760 3400 2964 -75.5 5705.37
5 6500 7200 6407 3368.8 11348914.56
6 5000 5600 4958 1918.6 3680883.41
7 3400 3900 3417 377.7 142634.58
8 4000 4500 3961 921.5 849188.95
9 1200 1400 1152 -1888.3 3565862.21
Mean 3041 27,588,479.68
SSE
13
Example R2
N
SSE i i
(Yˆ Y ) 2
27588479.68
R
2
i 1
N
0.993
(Y Y )
SST 2 27778489
i i
i 1
SSR i i
(Y Yˆ )2
190010.19
1 1 i 1
N
1 .993
i i
SST 27778489
(Y Y ) 2
i 1
100* R2 is the percentage of the sample variation
in Y that is explained by X.
Variation in X explains 99.3% of the variation in Y.14
Example R2 Graph
Y
Y 3041
" SSE"
" SST "
Yˆ2 2194
Y2 1900
“SSR”
X
X 2 2550
-116
15
• Another way to think about
R2 is a measure of how
well your model performs
More on relative to the simplest
model, wherein the values
R2 of Yi are predicted using
only the sample mean,
and no explanatory
variables.
16
• Note, that if you have no explanatory
variables, the least squares estimate of bo
will be the mean of Yi:
• Let a be an unknown value. Minimize the
sum of squared deviations of Yi from a.
a
(Yi a ) 2
2 (Yi a)(1) 0 Y a 0
i
Yi
Yi Na a
N
Y
17
Standard Error of the Regression (SER) –
another measure of goodness of fit
The SER measures the spread of the distribution of u. The SER
is (almost) the sample standard deviation of the OLS residuals:
1 n
SER = i
n 2 i 1
( ˆ
u ˆ
u ) 2
no. of coefficients
and intercept
1 n 2
Average residual
= uˆi
n 2 i 1 across the sample
1 n
(the second equality holds because û = uˆi = 0).
n i 1
18
1 n 2
SER =
n 2 i 1
uˆi
The SER:
has the units of u, which are the units of Y
measures the average “size” of the OLS residual (the average
“mistake” made by the OLS regression line)
The root mean squared error (RMSE) is closely related to the
SER:
1 n 2
RMSE =
n i 1
uˆi
This measures the same thing as the SER – the minor
difference is division by 1/n instead of 1/(n–2).
19
SER from previous example:
• Since we already calculated SSR = 190,010.19,
and our n=9,
we can find SER by dividing SSR by n-2, and
taking the square root:
1 1
SER SSR (190,010.19) 164.76
n2 92
This means that the average deviation of the
predicted from actual value of Yi is about $164.
20
• In order to draw any
specific conclusions
from our OLS
estimates (i.e. run
hypothesis tests with
Initial OLS known distributions)
we must make some
Assumptions assumptions about
the mathematical
properties of the
estimates and OLS
estimator.
21
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
• SLR.2.) Xi and Yi are
independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.
• SLR.4.) Independent variable
Xi varies across
observations.
Linear in Parameters.
Yi is a linear function of b0 and b1,
but not necessarily Xi.
Assumption Yi = b0 + b1Xi2 + ui or f(Yi)
= b0 + b1*g(Xi) + ui
SLR.1
are OK, but . . .
Yi = b0 + b12 Xi + ui is not.
23
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
• SLR.2.) Xi and Yi are
independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
4 • SLR.3.) The error term ui has
a zero mean conditional on Xi.
• SLR.4.) Independent variable
Xi varies across
observations.
• 2. (Xi, Yi) are i.i.d.
• This essentially means that
observations are randomly drawn
Assumption from a population.
SLR.2 • Think of optimal survey data.
• If this assumption is violated, we
cannot extrapolate sample findings
to the overall population.
25
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
• SLR.2.) Xi and Yi are
independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
6 • SLR.3.) The error term ui has
a zero mean conditional on Xi.
• SLR.4.) Independent variable
Xi varies across
observations.
E(ui | Xi) = 0.
-To make sense of this, we
need to remember that each
observation of ui is a single
Assumption draw from an underlying
distribution.
SLR.3
-This assumption simply states
that this distribution,
associated with each value
of Xi, is centered around
zero.
27
• What if this assumption does not hold?
Scatter Plot
Scatter Plot with
withE(e)>0
E(ui) > 0
9 Better than OLS
8
7
6 OLS
5
Y
4
3
2
1
0
0 2 4 6 8 10 12
X
28
Another important implication of
the zero conditional mean
assumption is that
Assumption
E(ui | Xi) = 0 implies that
SLR.3
COV(Xi,ui) = 0
29
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
• SLR.2.) Xi and Yi are
independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
0 • SLR.3.) The error term ui has
a zero mean conditional on Xi.
• SLR.4.) Independent variable
Xi varies across
observations.
Assumption SLR.4
• Xi is not constant across observations.
• A mathematical necessity, given our
formula for the OLS estimate.
Bˆ1
(Yi Y )( X i X )
is not defined if
(X i X) 2
i
( X X ) 2
0
31
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
• SLR.2.) Xi and Yi are
independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.
• SLR.4.) Independent variable
Xi varies across
observations.
Sampling Distribution of the OLS
Estimators
• A key concept of estimation is the idea that Bˆ 0 and Bˆ1
are random variables derived from the sampling
distribution of the error term ui.
• We have to imagine the data that we observe (Yi
and Xi) to be the result of one of an infinite
number of possible outcomes.
33
Sampling Distribution
Each potential set of observations will have with it
a new “best fit” line, and new estimates of B0 and
B1
34
So, we observe one
possible estimate of the
true underlying
(population) parameter.
Sampling
Distribution We don’t know what the
value of that parameter is,
but we do know something
about its relationship to the
distribution from which our
estimate arose. . .
35
Properties of OLS Sampling
Distribution
• 1. The distribution of B̂1 is approximately normal in large
samples.
• 2. The distribution of B̂1 is centered about the true value
of 1 .
• 3. The variance of the distribution of B̂1 decreases as
the sample size increases.
– Smaller variance will be about the mean
36
Properties
• 1. It can be shown that the Central Limit
Theorem applies to the OLS estimates, and
therefore we may assume that when n > 100,
B̂1 is normally distributed.
– Therefore, the distribution will be symmetric about
its mean (see next slide), with known probability
density function.
37
Properties
• 2. Saying that the distribution of B̂1 is centered
about the true value of 1 is another way of
saying that B̂1 is an unbiased estimate of 1.
Mean of Bˆ1 E ( Bˆ1 )
38