Solutions to lab 2 Statistical Inference
Nataliia Ostapenko
24 November 2017
Question 1 from Homework
I 1. E (y ) = E (z − y ) = E (z) − E (y ) = 30 − 5 = 25
I 2. Var (x ) = E (x 2 ) − (E (x ))2 = 80 − 52 = 55
I 3. Cov (zy ) = E (zy ) − E (z) ∗ E (y ) = 1500 − 750 = 750
I 4. Var (x ) = Var (y ) + Var (x ) − 2 ∗ Cov (z, y )
I I 55 = 50 + Var (z) − 1500
I I Var (z) = 1505
Question 1 from Homework. Linear projection
Cov (x , y )
I β= since we are projecting x on y!
Var (y )
I Cov (x , y ) = E (xy ) − E (x )E (y ) but we don’t have E (xy ), lets
rewrite
I = E ((z − y )y ) − E (z − y )E (y ) by the definition of x
I = E (zy ) − E (y 2 ) − E (z)E (y ) + (E (y ))2 where first and third
is Cov (z, y ) and second and fo urth is −Var (y )
I Cov (x , y ) = Cov (z, y ) − Var (y )
Question 1 from Homework. Linear projection continue
Cov (z, y ) − Var (y )
I Therefore β =
Var (y )
Cov (z, y ) 750
I β= −1= − 1 = 14
Var (y ) 50
I β0 = E (x ) − E (y ) ∗ β = 5 − 25 ∗ 14 = −345
I Linear projection is L(x |1, y ) = β0 + y ∗ β = −345 + y ∗ 14
Question 3 from Homework (a). T-test
I 1. H0 : β0 = 0, H1 : β0 ! = 0
ˆ 0 − β0
beta
I t − stat =
s.e.(βˆ0 )
−12.95 − 0
I t − stat = = −0.91
14.23
I t95%critical = 1.987 from the statistical table
I |t| is not higher than t95%critical and we cannot reject H0
Question 3 from Homework (a). T-test
I 2. H0 : β1 = 1, H1 : β1 ! = 1
ˆ 1 − β1
beta
I t − stat =
s.e.(βˆ1 )
0.886 − 1
I t − stat = = −1.34
0.085
I t95%critical = 1.987 from the statistical table
I |t| is not higher than t95%critical and we cannot reject H0
Question 3 from Homework (b). Direction of the Bias
I When we have ommited variable bias the direction of the bias is
I Bias=effect of ommited variable on the dependent*correlation
between ommited variable a nd independent variable in the
regression
I Bias = β2 ∗ corr (assess, sqrft)
I Bias = (+) ∗ (+) = + we have positive bias here
Question 3 from Homework (c). Testing Joint Hypothesis
I H0 : β0 = 0 and β1 = 1, H1 : β0 ! = 0 or β1 ! = 1
(SSRr − SSRur )/numberofrestrictions
I F =
SSRur /df = (n − k − 1)
(208349.11 − 144323.88)/(94 − 92) 32012.615
I = = = 20.41
144323.88/92 1568.7378
I F (2, 92)95%critical = 3.10 from the statistical table
I F > F (2, 92)95%critical we reject H0
Question 3 from Homework (d). Testing Joint Hypothesis
I NB now unrestricted model is the model presented in part (d)
I H0 : β2 = 0 and β3 = 0 and β4 = 0
I H1: at least one is non zero
(R 2 ur − Rr2 )/numberofrestrictions
I F = 2 /df = (n − k − 1)
1 − Rur
(0.895 − 0.871)/3 0.025/3
I = = = 7.13
1 − 0.895/89 0.104/89
I F (3, 89)95%critical = 2.71 from the statistical table
I F > F (3, 89)95%critical we reject H0
Question 3 from Homework (e). Heteroscedastic errors
I F-test in not F-distributed as well as t-test is not t-distributed!
I Hypothesis testing is invalid
I But coefficients are still unbiased
Question 3 from Homework (f). Multicollinearity
I if there is not perfect multicollinearity it will increse standard
errors in r egression
I t − statistics for coefficients might be insignificant while
F − test is significant
I if there is a perfect multicollinearity we cannot identify the
coefficients
Question 4 from Homework
RSS 4381.53
I ResidualMS = = = 0.15596
√ df 28094√
I RootMSE = ResidualMS = 0.15596 = 0.3979
ModelSS 2033.3
I R2 = = = 0.3170
TotalSS 6414.82
R 2 /k 0.3170/4
I F (4, 28094) = 2
= =
(1 − R )/(n − k − 1) 0.6830/28094
3259.32
Question 4 from Homework
Figure 1: Answers
Exercise 3.14
. use hprice1.dta, clear
. regress price sqrft bdrms
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(2, 85) = 72.96
Model | 580009.152 2 290004.576 Prob > F = 0.0000
Residual | 337845.354 85 3974.65122 R-squared = 0.6319
-------------+---------------------------------- Adj R-squared = 0.6233
Total | 917854.506 87 10550.0518 Root MSE = 63.045
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sqrft | .1284362 .0138245 9.29 0.000 .1009495 .1559229
bdrms | 15.19819 9.483517 1.60 0.113 -3.657582 34.05396
_cons | -19.315 31.04662 -0.62 0.536 -81.04399 42.414
------------------------------------------------------------------------------
Exercise 3.14 (a)
I NB the price of the room is showed in 1000$
I price = -19.3 + 0.13sqrft + 15.2bdrms
. describe price
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------
price float %9.0g house price, $1000s
. regress price sqrft bdrms
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(2, 85) = 72.96
Model | 580009.152 2 290004.576 Prob > F = 0.0000
Residual | 337845.354 85 3974.65122 R-squared = 0.6319
-------------+---------------------------------- Adj R-squared = 0.6233
Total | 917854.506 87 10550.0518 Root MSE = 63.045
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sqrft | .1284362 .0138245 9.29 0.000 .1009495 .1559229
bdrms | 15.19819 9.483517 1.60 0.113 -3.657582 34.05396
_cons | -19.315 31.04662 -0.62 0.536 -81.04399 42.414
------------------------------------------------------------------------------
Exercise 3.14 (b), (c), (d)
I 2. 15200$ is estimated increase in price for a room with one more
bedroom.
I 3. the estimated increase in in price for a room with one more
bedroom and 140 sq feet in size is
(15.2 + 0.13 ∗ 140) ∗ 1000$ = 33400$
I I the estimated incrase in price is 2.2 times higher than in (b)
I 4. 63.2%
. display -19.3+0.132438+15.24 358.44
Exercise 3.14 (e), (f)
I price = (−19.3 + 0.13 ∗ 2438 + 15.2 ∗ 4) ∗ 1000$ = 358440$
I 5. price=358440$
I 6. the residual is 358440 − 300000 = 58440$
I 6. he underpaid 58440$
I But there are many other features of a house (ommited
variable?) that affect price, an d we have not controlled for
these.
. display 358440-300000
58440
Exercise 4.14 (a)
I lprice = 4.77 + 0.0003 ∗ sqrft + 0.03 ∗ bdrms
I θ = 150 ∗ 0.000379 + 0.0289 = 0.0858
I which means that an additional 150 square foot bedroom
increases the predicted price by about 8.6%.
. regress lprice sqrft bdrms
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(2, 85) = 60.73
Model | 4.71671468 2 2.35835734 Prob > F = 0.0000
Residual | 3.30088884 85 .038833986 R-squared = 0.5883
-------------+---------------------------------- Adj R-squared = 0.5786
Total | 8.01760352 87 .092156362 Root MSE = .19706
------------------------------------------------------------------------------
lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sqrft | .0003794 .0000432 8.78 0.000 .0002935 .0004654
bdrms | .0288844 .0296433 0.97 0.333 -.0300543 .0878232
_cons | 4.766027 .0970445 49.11 0.000 4.573077 4.958978
------------------------------------------------------------------------------
. display 150*0.000379 + 0.0289
.08575
Exercise 4.14 (b)
I β2 = θ1 − 150 ∗ β1
I lprice = β0 + β1 ∗ sqrft + (θ1 − 150 ∗ β1 ) ∗ bdrms
I lprice = β0 + β1 ∗ sqrft + θ1 ∗ bdrms − 150 ∗ β1 ∗ bdrms
I lprice = β0 + β1 ∗ (sqrft − 150 ∗ bdrms) + θ1 ∗ bdrms
I we need to generate sqrft-150*bdrms and insert it into the
regression
. generate new=sqrft-150*bdrms
. regress lprice new bdrms
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(2, 85) = 60.73
Model | 4.71671468 2 2.35835734 Prob > F = 0.0000
Residual | 3.30088884 85 .038833986 R-squared = 0.5883
-------------+---------------------------------- Adj R-squared = 0.5786
Total | 8.01760352 87 .092156362 Root MSE = .19706
------------------------------------------------------------------------------
lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
new | .0003794 .0000432 8.78 0.000 .0002935 .0004654
bdrms | .0858013 .0267675 3.21 0.002 .0325804 .1390223
_cons | 4.766027 .0970445 49.11 0.000 4.573077 4.958978
------------------------------------------------------------------------------
Exercise 4.14 (c)
I θ = 0.0858
I NB θ is not the coefficient of the new variab le, but the
coefficient of bdrms see the equation!
I s.e. = 0.027
I CI = 0.0858 + (−)0.027 ∗ 1.987
I CI = [0.032151; 0.139449]
. display 0.0858+0.027*1.987
.139449
. display 0.0858-0.027*1.987
.032151
Exercise 4.17
. use WAGE2.dta, clear
. regress lwage educ exper tenure
Source | SS df MS Number of obs = 935
-------------+---------------------------------- F(3, 931) = 56.97
Model | 25.6953242 3 8.56510806 Prob > F = 0.0000
Residual | 139.960959 931 .150334005 R-squared = 0.1551
-------------+---------------------------------- Adj R-squared = 0.1524
Total | 165.656283 934 .177362188 Root MSE = .38773
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0748638 .0065124 11.50 0.000 .062083 .0876446
exper | .0153285 .0033696 4.55 0.000 .0087156 .0219413
tenure | .0133748 .0025872 5.17 0.000 .0082974 .0184522
_cons | 5.496696 .1105282 49.73 0.000 5.279782 5.713609
------------------------------------------------------------------------------
Exercise 4.17
I we need to test H0 : β2 = β3
I lets rewrite H0 : θ = β2 − β3 = 0
I we test it against H1 : θ! = 0
I we can express β2 = θ + beta3
I then rewrite the equation as
I lwage = β0 + β1 ∗ educ + (θ + β3 ) ∗ exper + β3 ∗ tenure
I lwage = β0 + β1 ∗ educ + θ ∗ exper + β3 ∗ (tenure + exper )
I we nee to generate newvariable = tenure + exper and insert it
in regression
. generate new=tenure+exper
. regress lwage educ exper new
Source | SS df MS Number of obs = 935
-------------+---------------------------------- F(3, 931) = 56.97
Model | 25.6953242 3 8.56510806 Prob > F = 0.0000
Residual | 139.960959 931 .150334005 R-squared = 0.1551
-------------+---------------------------------- Adj R-squared = 0.1524
Total | 165.656283 934 .177362188 Root MSE = .38773
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0748638 .0065124 11.50 0.000 .062083 .0876446
exper | .0019537 .0047434 0.41 0.681 -.0073554 .0112627
new | .0133748 .0025872 5.17 0.000 .0082974 .0184522
Exercise 4.17
I theta=0.0019537
I NB θ is not the coefficient of the new variab le, but the
coefficient of exper see the equation!
I s.e.= 0.0047434
(0.0019537 − 0)
I t= = 0.41
0.0047434
I t95%critical =1.96 from the statistical table
I |t| is not greater than t95%critical -> we cannot reject H0 ->
theta is not stati stically significantly different from 0
I CI also includes 0!