ECON4150 - Introductory Econometrics
Lecture 4: Linear Regression with One
Regressor
Monique de Haan
([email protected])
Stock and Watson Chapter 4
2
Lecture outline
• The OLS estimators
• The effect of class size on test scores
• The Least Squares Assumptions
• E (ui |Xi ) = 0
• (Xi , Yi ) are i.i.d
• Large outliers are unlikely
• Properties of the OLS estimators
• unbiasedness
• consistency
• large sample distribution
• The compulsory term paper
3
The OLS estimators
Question of interest: What is the effect of a change in Xi on Yi ?
Yi = β0 + β1 Xi + ui
Last week we derived the OLS estimators of β0 and β1 :
c0 = Y − βb1 X
β
1
Pn
n−1 i=1 (Xi −X )(Yi −Y ) sxy
β
c1 = 1
Pn = sx2
,
n−1 i=1 (Xi −X )(Xi −X )
4
OLS estimates: The effect of class size on test scores
Friday January 13 14:48:31 2017 Page 1
Question of interest: What is the effect of a change in class size on test
___ ____ ____ ____ ____(R)
scores? /__ / ____/ / ____/
___/ / /___/ / /___/
TestScorei = β0 + β1 ClassSize i + u i
Statistics/Data Analysis
1 . regress test_score class_size, robust
Linear regression Number of obs = 420
F(1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
Robust
test_score Coef. Std. Err. t P>|t| [95% Conf. Interval]
class_size -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons 698.933 10.36436 67.44 0.000 678.5602 719.3057
\ i = 698.93 − 2.28 · ClassSizei
TestScore
5
The Least Squares assumptions
Yi = β0 + β1 Xi + ui
Under what assumptions does the method of ordinary least squares provide
appropriate estimators of β0 and β0 ?
Under what assumptions does the method of ordinary least squares provide
an appropriate estimator of the effect of class size on test scores?
The Least Squares assumptions:
Assumption 1: The conditional mean of ui given Xi is zero
E (ui |Xi ) = 0
Assumption 2: (Yi , Xi ) for i = 1, ..., n are independently and
identically distributed (i.i.d)
Assumption 3: Large outliers are unlikely
0 < E Xi4 < ∞ & 0 < E Yi4 < ∞
6
The Least Squares assumptions: Assumption 1
E (ui |Xi ) = 0
The first OLS assumption states that:
All other factors that affect the dependent variable Yi (contained in ui ) are
unrelated to Xi in the sense that, given a value of Xi , the mean of these other
factors equals zero.
In the class size example:
All the other factors affecting test scores should be unrelated to class size in
the sense that, given a value of class size, the mean of these other factors
equals zero.
7
The Least Squares assumptions: Assumption 1
The first OLS assumption can also be written as:
E (Yi |Xi ) = E (β0 + β1 Xi + ui |Xi )
Expectation rules
= β0 + β1 E (Xi |Xi ) + E (ui |Xi )
ASS#1 : E (ui |Xi ) = 0
= β0 + β1 Xi
8
The Least Squares assumptions: Assumption 1
E (Yi |Xi ) = β0 + β1
9
The Least Squares assumptions: Assumption 1
Example of a violation of assumption 1:
Suppose that
• districts which wealthy inhabitants have small classes and good
teachers
• these districts have a lot of money which they can use to hire more
and better teachers
• districts with poor inhabitants have large classes and bad teachers.
• These districts have little money and can hire only few and not very
good teachers
In this case class size is related to teacher quality.
Since teacher quality likely affects test scores it is contained in ui .
This implies a violation of assumption 1:
E (ui |ClassSizei = small) 6= E (ui |ClassSizei = large) 6= 0
10
The Least Squares assumptions: Assumption 2
(Yi , Xi ) for i = 1, ..., n are i.i.d
• If the sample is drawn by simple random sampling assumption 2 will hold
Example: What is effect of mother’s education (Xi ) on child’s education (Yi )
Example of simple random sampling:
• randomly draw sample of mother’s with information on her education
and the education of one randomly selected child
• (Yi , Xi ) for i = 1, ..., n are i.i.d
Example of a violation of simple random sampling
• randomly draw sample of mothers with information on her education and
the education of all of her children.
• (Yi , Xi ) for i = 1, ..., n are NOT i.i.d
• Observations on children from the same mother are not independent!
11
The Least Squares assumptions: Assumption 3
Large outliers are unlikely
0 < E Xi4 < ∞ & 0 < E Yi4 < ∞
• Outliers are observations that have values far outside the usual range of
the data
• Large outliers can make OLS regression results misleading
• Another way to state assumption is that X and Y have finite kurtosis.
• Assumption is necessary to justify the large sample approximation to the
sampling distribution of the OLS estimators
12
The Least Squares assumptions: Assumption 3
13
Use of the Least Squares assumptions
Yi = β0 + β1 Xi + ui
Assumption 1: E (ui |Xi ) = 0
Assumption 2: (Yi , Xi ) for i = 1, ..., n are i.i.d
Assumption 3: Large outliers are unlikely
If the 3 least squares assumptions hold the OLS estimators βb0 and βb1
• Are unbiased estimators of β0 and β1
• Are consistent estimators of β0 and β1
• Have a jointly normal sampling distribution
14
Properties of the OLS estimator: unbiasedness
Yi = β0 + β1 Xi + ui Y = β0 + β1 Xi + u
Pn
(Xi −X )(Yi −Y )
h i
i=1
E βb1 =E Pn
i=1 (Xi −X )(Xi −X )
substitute for Yi , Y
Pn
i=1 (Xi −X )(β0 +β1 Xi +ui −(β0 +β1 X +u))
=E Pn
i=1 (Xi −X )(Xi −X )
rewrite (β0 drops out)
Pn
i=1 (Xi −X )(β1 (Xi −X )+(ui −u))
=E Pn
i=1 (Xi −X )(Xi −X )
rewrite & use expectation rules
Pn Pn
β1 i=1 (Xi −X )(Xi −X ) i=1 (Xi −X )(ui −u)
= E Pn X −X X −X + E Pn X −X X −X
i=1 ( i )( i ) i=1 ( i )( i )
15
Properties of the OLS estimator: unbiasedness
.
β1 ni=1 (Xi −X )(Xi −X )
Pn
(Xi −X )(ui −u)
h i P
i=1
E βb1 =E Pn +E Pn
i=1 ( i
X −X )(Xi −X ) i=1 ( i
X −X )(Xi −X )
take β1 out of 1st expectation
Algebra trick
Pn
(Xi −X )ui
= β1 + E Pn i=1
i=1 ( i
X −X )(Xi −X )
Law of iterated expectations
Pn
(X −X )E[ui |Xi ]
= β1 + E Pni=1 X i−X X −X
i=1 ( i )( i )
h i
E βb1 = β1 if E [ui |Xi ] = 0
16
Algebra trick
Pn Pn Pn Pn Pn
i=1 Xi − X (ui − u) = i=1 Xi ui − i=1 Xi u − i=1 X ui + i=1 Xu
Pn 1
Pn Pn
= i=1 Xi ui − n · n i=1 Xi u − i=1 X ui + nX u
Pn
Xi ui − nX u + ni=1 X ui +nX u
P
= i=1
= ni=1 Xi ui − ni=1 X ui
P P
P
= ni=1 Xi − X ui
17
Consistency
p
b1 −→ β1
Consistency:β or plim βb1 = β1
Pn
i=1 (Xi −X )(Yi −Y )
Plim βb1 = plim Pn
i=1 (Xi −X )(Xi −X )
1 Pn
Plim n−1 i=1 (Xi −X )(Yi −Y ) sXY
= 1 Pn = 2
i=1 ( i
Plim n−1 X −X )(Xi −X ) sX
law of large numbers
OLS assumptions 2 and 3
Cov (Xi ,Yi )
= Var (Xi )
substitute for Yi
Cov (Xi ,β0 +β1 Xi +ui )
= Var (Xi )
see Key Concept 2.3
β1 Var (Xi )+Cov (Xi ,ui )
= Var (Xi )
18
Consistency
β1 Var (Xi )+Cov (Xi ,ui )
Plim βb1 = Var (Xi )
= β1 Var (Xi )
Var (X )
+ Cov (Xi ,ui )
Var (Xi )
i
substitute covariance expression
E[(Xi −µx )(ui −µu )]
= β1 + Var (Xi )
algebra trick
E[(Xi −µx )ui ]
= β1 + Var (Xi )
Law of iterated expectations
E[(Xi −µx )E[ui |Xi ]]
= β1 + Var (Xi )
so
Plim βb1 = β1 if E [ui |Xi ] = 0
19
Unbiasedness vs Consistency
• Unbiasedness & consistency both rely on E [ui |Xi ] = 0
h i
• Unbiasedness implies that E βb1 = β1 for a given sample size n
• Consistency implies that the sampling distribution becomes more and
more tightly distributed around β1 if the sample size n becomes larger
and larger.
20
Consistency: A simulation example
• Lets create a data set with 100 observations
• Xi ∼ N(0, 1)
• ui ∼ N(0, 1)
• We define Y to depend on X as: Yi = 1 + 2Xi + ui
Thursday January 19 12:00:40 2017 Page 1
set obs 1000 ___ ____ ____ ____ ____(R)
gen x=invnorm(uniform()) /__ / ____/ / ____/
___/ / /___/ / /___/
gen y=1+2*x+invnorm(uniform())
Statistics/Data Analysis
1 . sum y x
Variable Obs Mean Std. Dev. Min Max
y 100 .6123606 2.211365 -5.05828 5.462746
x 100 -.1479108 .9928607 -2.633841 1.80305
21
A simulation example
Y
0
Thursday January 19 12:01:25 2017 Page 1
-5
-3 -2 -1 0 ___ ____1 ____2 ____ ____(R)
X /__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis
1 . regress y x
Source SS df MS Number of obs = 100
F( 1, 98) = 385.45
Model 385.987671 1 385.987671 Prob > F = 0.0000
Residual 98.1357149 98 1.00138485 R-squared = 0.7973
Adj R-squared = 0.7952
Total 484.123386 99 4.89013521 Root MSE = 1.0007
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
x 1.988753 .1012965 19.63 0.000 1.787733 2.189772
_cons .9065187 .1011847 8.96 0.000 .705721 1.107316
22
A simulation example n=100
Monday February 16 16:57:38 2015 Page 1
We can create 999 of these data sets with 100 observations and use OLS to
estimate ___ ____ ____ ____ ____(R)
/__ / ____/ / ____/
Yi = β0 + β1 + ui ___/ / /___/ / /___/
Statistics/Data Analysis
1 . program define ols, rclass
1 . drop _all
2 . set obs 100
3 . gen x=invnorm(uniform())
4 . gen y=1+2*x+invnorm(uniform())
5 . regress y x
6 . end
2 .
3 . simulate _b, reps(999) nodots : ols
command: ols
4 . sum
Variable Obs Mean Std. Dev. Min Max
_b_x 999 1.997521 .1018595 1.67569 2.308795
_b_cons 999 1.003246 .1019056 .6844429 1.285363
23
A simulation example n=100
OLS estimates of B1 in 999 samples with n=100
0
1.6 1.8 2 2.2 2.4
OLS estimates of B1
24
A simulation example
Tuesday February n=1000
17 13:03:15 2015 Page 1
___ ____ ____ ____ ____(R)
/__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis
1 . program define ols, rclass
1 . drop _all
2 . set obs 1000
3 . gen x=invnorm(uniform())
4 . gen y=1+2*x+invnorm(uniform())
5 . regress y x
6 . end
2 .
3 . simulate _b, reps(999) nodots : ols
command: ols
4 . sum
Variable Obs Mean Std. Dev. Min Max
_b_x 999 2.000035 .030417 1.908725 2.112585
_b_cons 999 1.000791 .0311526 .8970624 1.088724
25
A simulation example n=1000
OLS estimates of B1 in 999 samples with n=1000
15
10
0
1.6 1.8 2 2.2 2.4
OLS estimates of B1
26
A simulation example
Friday January n=10000
20 12:01:22 2017 Page 1
___ ____ ____ ____ ____(R)
/__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis
1 . program define ols, rclass
1 . drop _all
2 . set obs 10000
3 . gen x=invnorm(uniform())
4 . gen y=1+2*x+invnorm(uniform())
5 . regress y x
6 . end
2 .
3 . simulate _b, reps(999) nodots: ols
command: ols
4 . sum
Variable Obs Mean Std. Dev. Min Max
_b_x 999 1.999748 .0099715 1.969678 2.034566
_b_cons 999 1.000391 .0100135 .9699681 1.033458
27
A simulation example n=10000
OLS estimates of B1 in 999 samples with n=10000
40
30
20
10
0
1.6 1.8 2 2.2 2.4
OLS estimates of B1
28
Consistency of the OLS estimator of βb1
True model : Yi = 1 + 2Xi + ui , Estimated model : Yi = β0 + β1 Xi + ui
OLS estimates of B1 in 999 samples
with n=100; n=1000 and n=10000
40 n=100
n=1000
n=10000
30
20
.
10
0
1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
OLS estimates of B1
29
Sampling distribution of βb0 and βb1
We discussed the sampling distribution of the sample average Y :
• sampling distribution is complicated for small n, but if Y1 , ..., Yn are i.i.d.
we know that
E Y = µY
• By the Central Limit theorem the large sample distribution can be
approximated by the normal distribution:
σ2
Y ∼ N µY , Y
n
If the 3 least squares assumptions hold we can make similar statements
about the OLS estimators βb0 and βb1
30
Large-sample distribution of βb0 and βb1
• Technically the Central Limit theorem concerns the large sample
distribution of averages (like Y )
• Examining the formulas of the OLS estimators shows that these are
functions of sample averages:
βb0 = Y − βb1 X
1 Pn
n i=1 (Xi −X )(Yi −Y )
βb1 = 1 Pn
n i=1 (Xi −X )(Xi −X )
• It turns out that the Central Limit theorem also applies to these functions
of sample averages.
31
Sampling distribution of βb0 and βb1
If the first least squares assumption holds:
• The OLS estimators are unbiased which implies that (for any sample
size n)
E βb0 = β0 and E βb1 = β1
In addition, if all 3 least squares assumptions hold
• The Central Limit theorem implies that βb0 and βb1 are approximately
jointly normally distributed in large samples:
βb0 ∼ N β0 , σβ2b
0
βb1 ∼ N β1 , σβ2b
1
32
Large-sample distribution of βb0 and βb1
In large samples
βb0 ∼ N β0 , σβ2b
0
βb1 ∼ N β1 , σβ2b
1
where it can be shown that
1 Var (Hi ui ) µX
σβ2b = n E H2 2
with Hi = 1 − E (Xi2 )
Xi
0 [ ( i )]
1 Var [(Xi −µX )ui ]
σβ2b = n [Var (Xi )]2
1
Expression for σβ2b shows that the larger the variation in the regressor Xi the
1
smaller the variance of βb1
33
Large-sample distribution of βb0 and βb1
• When Var (Xi ) is low, it is difficult to obtain
an accurate estimate of the
effect of X on Y which implies that Var β1 = σ 2b is high.
b
β1
• If there is more variation in X, then there is more information in the data
that you can use to fit the regression line.
34
Compulsory term paper
• Traffic fatalities are the leading cause of death for Americans between
the ages of 5 and 32.
• The government wants to decrease the number of traffic fatalities by
increasing seat belt usage.
• If many people wear seat belts the chance that people die in a car crash
is likely smaller.
• People who wear seat belts might however be more careful drivers.
• Regions with many seat belt users might have fewer traffic fatalities not
because of the seat belt usage but because the drivers are more careful.
35
Compulsory term paper
• In the term paper you are going to investigate the following research
question.
What is the causal effect of seat belt usage on traffic fatalities?
• This research question can be addressed by using the data set
seatbelts.dta.
• Data consists of a panel of 50 U.S. States, plus the District of Columbia,
for the years 1983-1997.
• The data sets can be downloaded from the course website site.
• In analyzing this data you may consider the use of panel data methods
on top of a pure cross-section analysis.
36
Compulsory term paper
The term paper should consist of the following sections:
• Introduction
• Empirical approach
• Data
• Results
• Conclusion
• References
• Appendix with Stata code & output
The term paper should be at most 10 pages including tables and figures (but
excluding the stata code and output).
The quality (and not the quantity) of the content of the term paper will
determine your grade.
37
Compulsory term paper
You are expected to work in a group of two students.
• You can form a group of two students yourself
• Register this group before 29 January 2017 00:00, by using link in email
you will receive today.
• If you are unable to form a group, please let me know before 29 January
2017.
• you will be randomly assigned to another student.
Important dates:
• 25 January 2017– Hand-out of term paper
• 22 March 2017 – Hand-in of term paper on Fronter
• 11 April 2017 – Notification of grade (pass/fail)
• 21 April 2017 – Hand-in of improved term paper for those who failed
• 4 May 2017– Everyone is informed about final grade for term paper