0% found this document useful (0 votes)

25 views38 pages

The Simple Linear Regression Model (Part 2)

The document discusses the concept of goodness of fit in the context of the Simple Linear Regression Model, focusing on the sum of squared residuals (SSR), total sum of squares (SST), and explained sum of squares (SSE). It introduces the coefficient of determination (R²) as a measure of how well the model explains the variation in the dependent variable, with values closer to 1 indicating a better fit. Additionally, it covers the standard error of the regression (SER) and the necessary assumptions for ordinary least squares (OLS) estimation to be valid.

Uploaded by

Madil Escabusa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views38 pages

The Simple Linear Regression Model (Part 2)

Uploaded by

Madil Escabusa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

The Simple

Linear
Regression
Model (Part 2)

1
Goodness of Fit

Given that the line represented by the

OLS estimates of the slope and intercept
are the “optimal choices” in that they
minimize SSR, we still need to compare
SSR across several models to get a better
goodness of fit.

2
Goodness of Fit
• 1. The sum of squared residuals represents
the degree to which our model missed the
data. A lower SSR means a “better” fit.

N N
SSR   (Yi  Yi )   u i
ˆ 2
ˆ 2

i 1 i 1

• However, the value of SSR depends on the

scale of the data, will not allow for consistent
comparison across equations 3
• 2. What represents the degree to which
our model succeeds?

– For each observation, we are trying to

explain the deviation from the mean of the
dependent variable.

– For the sample as a whole, we look at the

sum of squared deviations from the mean,
which is the Total Sum of Squares or SST.

N
SST   (Yi  Yi )  var(Yi )
2

i 1 4
• 3. The Explained Sum of Squares (SSE)
represents the deviation of the fitted values from
the mean.
N
SSE   (Yˆi  Yi ) 2
i 1

• A “perfect fit” would happen when SSE = SST

and the fitted Yi is the same as the observed Yi.

• Note that SST = SSE + SSR for OLS

5
• The COEFFICIENT OF
DETERMINATION or R2
the percentage of total
variation (SST) that is
explained by the model
(SSE)
– The ratio of the explained
variation compared to the
total variation
– measure of goodness of fit

6
Coefficient of Determination
N N

SSE
 i i
(Yˆ  Y ) 2

SSR
 i i
(Y  Yˆ ) 2

R 
2
 i 1
N
 1  1 i 1
N

 i i  i i
SST SST
(Y  Y ) 2
(Y  Y ) 2

i 1 i 1

• SSE/SST means the fraction of the

sample variation in Y that is explained by
X.
• The closer the value to 1, the better fit our
data is
7
Venn Diagram of R2
“Variation in Yi”
A  B   (Yi  Y ) 2  SST
B  C   ( X i  X )2
A
B   (Yi  Y )( X i  X )

B A   (Yi  Yˆ ) 2  SSE
B /( A  B)  R 2
C B /( B  C )  ˆ
1

The greater B (the overlap),

“Variation in Xi” the better the fit
8
• 5. By definition, R2 will be
between zero and 1, simply
because SST will never be less
than SSR. SSE can be no greater
than SST
– An R2 = 1 indicates that all
observations lie exactly on the
regression line. OLS provides a
perfect fit to the data. This
never happens, and if you see
it, there is something wrong.

9
• A value of R2 that is nearly
equal to zero indicates a
poor fit of the OLS line:
very little of the variation in
Yi is captured by the
variation in the Yi_hat.
However, in some
Coefficient of instances the value of R2 =
Determination 0.07 is okay as long as
coefficients make sense.
• In panel and cross section
data, R2 is lower.
• In time series data, R2 is
higher.

10
Back to Our Example - SSR
i Yi Xi Yˆ Yi  Yˆ (Yi  Yˆ ) 2
1 1050 1100 880.8 169.2 28615.19
2 1900 2550 2195.1 -295.1 87102.18

3 1560 1700 1424.7 135.3 18310.33

4 2760 3400 2965.6 -205.6 42262.00

5 6500 7200 6409.9 90.1 8113.30

6 5000 5600 4959.7 40.3 1626.19

7 3400 3900 3418.8 -18.8 352.73

8 4000 4500 3962.6 37.4 1396.85
9 1200 1400 1152.8 47.2 2231.42
Mean 3041 190,010.19
SSR 11
Calculate TSS
(Y-Y_bar)2
i Yi Xi Y-Y_bar
3964523
1 1050 1100 -1991
1302135
2 1900 2550 -1141
2193690
3 1560 1700 -1481
79023
4 2760 3400 -281
11963912
5 6500 7200 3459
3837246
6 5000 5600 1959
128801
7 3400 3900 359
919468
8 4000 4500 959
3389690
9 1200 1400 -1841
Mean 3041 27,778,489 12
SST
Calculate ESS
(Y_hat - Y_bar)2
i Yi Xi Y_hat Y_hat - Y_bar

1 1050 1100 880.6 -2160.3 4666772.31

2 1900 2550 2194 -846.0 715682.72
3 1560 1700 1424 -1616.4 2612835.57
4 2760 3400 2964 -75.5 5705.37
5 6500 7200 6407 3368.8 11348914.56
6 5000 5600 4958 1918.6 3680883.41
7 3400 3900 3417 377.7 142634.58
8 4000 4500 3961 921.5 849188.95
9 1200 1400 1152 -1888.3 3565862.21
Mean 3041 27,588,479.68
SSE
13
Example R2
N

SSE  i i
(Yˆ  Y ) 2
27588479.68
R 
2
 i 1
N
  0.993
 (Y  Y )
SST 2 27778489
i i
i 1

SSR  i i
(Y  Yˆ )2
190010.19
1  1 i 1
N
 1  .993
 i i
SST 27778489
(Y  Y ) 2

i 1

100* R2 is the percentage of the sample variation

in Y that is explained by X.
Variation in X explains 99.3% of the variation in Y.14
Example R2 Graph
Y

Y  3041

 
" SSE" 

 " SST "
Yˆ2  2194 
Y2  1900
“SSR”


X
X 2  2550

-116

15
• Another way to think about
R2 is a measure of how
well your model performs
More on relative to the simplest
model, wherein the values
R2 of Yi are predicted using
only the sample mean,
and no explanatory
variables.

16
• Note, that if you have no explanatory
variables, the least squares estimate of bo
will be the mean of Yi:

• Let a be an unknown value. Minimize the

sum of squared deviations of Yi from a.


a
 (Yi  a ) 2
2 (Yi  a)(1)  0  Y   a  0
i

 Yi
 Yi  Na  a
N
Y

17
Standard Error of the Regression (SER) –
another measure of goodness of fit
The SER measures the spread of the distribution of u. The SER
is (almost) the sample standard deviation of the OLS residuals:
1 n
SER =  i
n  2 i 1
( ˆ
u  ˆ
u ) 2

no. of coefficients
and intercept
1 n 2

Average residual
= uˆi
n  2 i 1 across the sample

1 n
(the second equality holds because û =  uˆi = 0).
n i 1
18
1 n 2
SER = 
n  2 i 1
uˆi

The SER:
 has the units of u, which are the units of Y
 measures the average “size” of the OLS residual (the average
“mistake” made by the OLS regression line)
 The root mean squared error (RMSE) is closely related to the
SER:
1 n 2
RMSE = 
n i 1
uˆi

This measures the same thing as the SER – the minor

difference is division by 1/n instead of 1/(n–2).

19
SER from previous example:
• Since we already calculated SSR = 190,010.19,
and our n=9,

we can find SER by dividing SSR by n-2, and

taking the square root:

1 1
SER  SSR  (190,010.19)  164.76
n2 92

This means that the average deviation of the

predicted from actual value of Yi is about $164.
20
• In order to draw any
specific conclusions
from our OLS
estimates (i.e. run
hypothesis tests with
Initial OLS known distributions)
we must make some
Assumptions assumptions about
the mathematical
properties of the
estimates and OLS
estimator.

21
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
Linear in Parameters.

Yi is a linear function of b0 and b1,

but not necessarily Xi.

Assumption Yi = b0 + b1Xi2 + ui or f(Yi)

= b0 + b1*g(Xi) + ui
SLR.1
are OK, but . . .

Yi = b0 + b12 Xi + ui is not.

23
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
4 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
• 2. (Xi, Yi) are i.i.d.

• This essentially means that

observations are randomly drawn
Assumption from a population.

SLR.2 • Think of optimal survey data.

• If this assumption is violated, we

cannot extrapolate sample findings
to the overall population.

25
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
6 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
E(ui | Xi) = 0.

-To make sense of this, we

need to remember that each
observation of ui is a single
Assumption draw from an underlying
distribution.
SLR.3
-This assumption simply states
that this distribution,
associated with each value
of Xi, is centered around
zero.

27
• What if this assumption does not hold?

Scatter Plot
Scatter Plot with
withE(e)>0
E(ui) > 0
9 Better than OLS
8
7
6 OLS
5
Y

4
3
2
1
0
0 2 4 6 8 10 12
X

28
Another important implication of
the zero conditional mean
assumption is that

Assumption
E(ui | Xi) = 0 implies that
SLR.3

COV(Xi,ui) = 0

29
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
0 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
Assumption SLR.4
• Xi is not constant across observations.

• A mathematical necessity, given our

formula for the OLS estimate.

Bˆ1 
 (Yi  Y )( X i  X )
is not defined if
 (X i  X) 2

 i
( X  X ) 2
0
31
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
Sampling Distribution of the OLS
Estimators
• A key concept of estimation is the idea that Bˆ 0 and Bˆ1
are random variables derived from the sampling
distribution of the error term ui.

• We have to imagine the data that we observe (Yi

and Xi) to be the result of one of an infinite
number of possible outcomes.

33
Sampling Distribution
Each potential set of observations will have with it
a new “best fit” line, and new estimates of B0 and
B1

34
So, we observe one
possible estimate of the
true underlying
(population) parameter.

Sampling
Distribution We don’t know what the
value of that parameter is,
but we do know something
about its relationship to the
distribution from which our
estimate arose. . .

35
Properties of OLS Sampling
Distribution
• 1. The distribution of B̂1 is approximately normal in large
samples.
• 2. The distribution of B̂1 is centered about the true value
of 1 .
• 3. The variance of the distribution of B̂1 decreases as
the sample size increases.
– Smaller variance will be about the mean

36
Properties
• 1. It can be shown that the Central Limit
Theorem applies to the OLS estimates, and
therefore we may assume that when n > 100,
B̂1 is normally distributed.

– Therefore, the distribution will be symmetric about

its mean (see next slide), with known probability
density function.

37
Properties
• 2. Saying that the distribution of B̂1 is centered
about the true value of 1 is another way of
saying that B̂1 is an unbiased estimate of 1.

Mean of Bˆ1  E ( Bˆ1 )  

Econometrics Chapter 4
No ratings yet
Econometrics Chapter 4
5 pages
Simple Regression Model - Estimation
No ratings yet
Simple Regression Model - Estimation
9 pages
Lecture Set 2
No ratings yet
Lecture Set 2
47 pages
Pertemuan 4
No ratings yet
Pertemuan 4
24 pages
Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)
No ratings yet
Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)
41 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Ec410 Lecture 4 - Simple Regression II
No ratings yet
Ec410 Lecture 4 - Simple Regression II
8 pages
Stock and Watson - Slides For Chapter 4
No ratings yet
Stock and Watson - Slides For Chapter 4
43 pages
05 16 Simple Regression 2
No ratings yet
05 16 Simple Regression 2
84 pages
Regression Formulas Quick Notes Simplified
No ratings yet
Regression Formulas Quick Notes Simplified
1 page
Week 3-4
No ratings yet
Week 3-4
75 pages
AF ECO 4000 Cheat Sheet
No ratings yet
AF ECO 4000 Cheat Sheet
3 pages
Lecture 6
No ratings yet
Lecture 6
45 pages
Chapter 2
No ratings yet
Chapter 2
12 pages
Temas 4 Al 7
No ratings yet
Temas 4 Al 7
191 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Econometrics for Finance Students
No ratings yet
Econometrics for Finance Students
64 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
24 pages
R18&19
No ratings yet
R18&19
32 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
Applications Chapter 4
No ratings yet
Applications Chapter 4
38 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
2024 1 Metrics 6 Multipleols 4
No ratings yet
2024 1 Metrics 6 Multipleols 4
18 pages
Econometrics for Students
No ratings yet
Econometrics for Students
28 pages
Linear Regression Basics
No ratings yet
Linear Regression Basics
41 pages
1 - The Simple Regression Model
No ratings yet
1 - The Simple Regression Model
41 pages
Chapter Two
No ratings yet
Chapter Two
44 pages
Econometrics Notes
No ratings yet
Econometrics Notes
15 pages
4.1 The Linear Regression Model: E (Tests
No ratings yet
4.1 The Linear Regression Model: E (Tests
16 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Ordinary Least Squares-2
No ratings yet
Ordinary Least Squares-2
31 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
Econometrics For Finace Lecture II-Session Three
No ratings yet
Econometrics For Finace Lecture II-Session Three
32 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
41 pages
Week 13
No ratings yet
Week 13
25 pages
CH 02
No ratings yet
CH 02
41 pages
4basic Econometrics Chapter III
No ratings yet
4basic Econometrics Chapter III
13 pages
Simple Linear Regression1
No ratings yet
Simple Linear Regression1
36 pages
Linear Regression MCQs & Essays
No ratings yet
Linear Regression MCQs & Essays
38 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Lecture Linear Regression One Regressor
No ratings yet
Lecture Linear Regression One Regressor
32 pages
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
R-Squared in Econometrics Lecture
No ratings yet
R-Squared in Econometrics Lecture
5 pages
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
No ratings yet
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
59 pages
Understanding Regression Models
No ratings yet
Understanding Regression Models
18 pages
Basic Economterics - I
No ratings yet
Basic Economterics - I
17 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Stats135 Reviewer
No ratings yet
Stats135 Reviewer
5 pages
6034 - Classical Linear Regression Model
No ratings yet
6034 - Classical Linear Regression Model
30 pages
Lecture 5
No ratings yet
Lecture 5
7 pages
Goodness of Fit: Squares (ESS) To The Total Sum of Squares (TSS)
No ratings yet
Goodness of Fit: Squares (ESS) To The Total Sum of Squares (TSS)
2 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Pertemuan 3 - Simple Linear Regression
No ratings yet
Pertemuan 3 - Simple Linear Regression
19 pages
Mathematics 13 02260
No ratings yet
Mathematics 13 02260
20 pages
Maroc 100 Recettes Authentiques Textbook PDF Download
100% (11)
Maroc 100 Recettes Authentiques Textbook PDF Download
17 pages
c11.4 02-Further Issues in Using OLS With Times Series Data
No ratings yet
c11.4 02-Further Issues in Using OLS With Times Series Data
61 pages
Time Series Analysis Lecture Notes
No ratings yet
Time Series Analysis Lecture Notes
150 pages
Independent and Identically Distributed Random Variables - Wikipedia
No ratings yet
Independent and Identically Distributed Random Variables - Wikipedia
10 pages
DSAI Group3 SlidingCUSUM Paper
No ratings yet
DSAI Group3 SlidingCUSUM Paper
13 pages
Covert Communication
No ratings yet
Covert Communication
43 pages
Joint PMF and Probability Distributions
No ratings yet
Joint PMF and Probability Distributions
11 pages
Probability & Statistics Problems
No ratings yet
Probability & Statistics Problems
16 pages
Introduction To Repairable System Modeling
No ratings yet
Introduction To Repairable System Modeling
27 pages
Lecture On Stochastic Processes
No ratings yet
Lecture On Stochastic Processes
77 pages
Sums of Random Variables in Probability
No ratings yet
Sums of Random Variables in Probability
5 pages
Stock and Watson Summary PDF
No ratings yet
Stock and Watson Summary PDF
2 pages
Lec 8 Poisson Processes
No ratings yet
Lec 8 Poisson Processes
35 pages
Analysis of Simulation Output
No ratings yet
Analysis of Simulation Output
27 pages
Time Series
No ratings yet
Time Series
190 pages
Data Science Probability Basics
No ratings yet
Data Science Probability Basics
7 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
Lecture 1
No ratings yet
Lecture 1
73 pages
Ee5110 Quiz 3
No ratings yet
Ee5110 Quiz 3
2 pages
Toward Better Statistical Validation of Machine Learning-Based Multimedia Quality Estimators
No ratings yet
Toward Better Statistical Validation of Machine Learning-Based Multimedia Quality Estimators
15 pages
Part 3 Simulation With R
No ratings yet
Part 3 Simulation With R
42 pages
Mathematics For Informatics 4a: The Story of The Film So Far..
No ratings yet
Mathematics For Informatics 4a: The Story of The Film So Far..
6 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
This Study Resource Was: Fundamentals of Probability Lab4: Random Process
No ratings yet
This Study Resource Was: Fundamentals of Probability Lab4: Random Process
2 pages
ECON3334 - Mock Midterm - Fall2024
No ratings yet
ECON3334 - Mock Midterm - Fall2024
4 pages
Example 3.1.1
No ratings yet
Example 3.1.1
22 pages
Probability 2 Lecture Notes
No ratings yet
Probability 2 Lecture Notes
96 pages
Generalizing The de Finetti Hewitt Savage Theorem v103 3 2
No ratings yet
Generalizing The de Finetti Hewitt Savage Theorem v103 3 2
116 pages
BB NPTEL Lecture 4
No ratings yet
BB NPTEL Lecture 4
20 pages

The Simple Linear Regression Model (Part 2)

Uploaded by

The Simple Linear Regression Model (Part 2)

Uploaded by

The Simple

Given that the line represented by the

• However, the value of SSR depends on the

– For each observation, we are trying to

– For the sample as a whole, we look at the

• A “perfect fit” would happen when SSE = SST

• Note that SST = SSE + SSR for OLS

• SSE/SST means the fraction of the

The greater B (the overlap),

3 1560 1700 1424.7 135.3 18310.33

4 2760 3400 2965.6 -205.6 42262.00

5 6500 7200 6409.9 90.1 8113.30

6 5000 5600 4959.7 40.3 1626.19

7 3400 3900 3418.8 -18.8 352.73

1 1050 1100 880.6 -2160.3 4666772.31

100* R2 is the percentage of the sample variation

• Let a be an unknown value. Minimize the

This measures the same thing as the SER – the minor

we can find SER by dividing SSR by n-2, and

This means that the average deviation of the

• SLR.2.) Xi and Yi are

• SLR.4.) Independent variable

Yi is a linear function of b0 and b1,

Assumption Yi = b0 + b1Xi2 + ui or f(Yi)

• SLR.2.) Xi and Yi are

• SLR.4.) Independent variable

• This essentially means that

SLR.2 • Think of optimal survey data.

• If this assumption is violated, we

• SLR.2.) Xi and Yi are

• SLR.4.) Independent variable

-To make sense of this, we

• SLR.2.) Xi and Yi are

• SLR.4.) Independent variable

• A mathematical necessity, given our

• SLR.2.) Xi and Yi are

• SLR.4.) Independent variable

• We have to imagine the data that we observe (Yi

– Therefore, the distribution will be symmetric about

Mean of Bˆ1  E ( Bˆ1 )  

You might also like