Solutions, Chapter 2/HL
ANSWERS TO CHAPTER 2
The Simple Regression Model
Econometrics
Economics of Innovation and Growth
A = Problems
B = Examples (from chapter 2)
C = Cumputer Exercises
Solutions, Chapter 2/HL
A: Problems
2.1
Let kids denote the number of children born to a woman, and let educ denote years of
education for the woman. A simple model relating fertility to years of education is
kids = 0 + 1educ + u
where u is the unobserved error.
(i)
(ii)
What kind of factors are contained in u? Are these likely to be correlated with
level of education?
Will a simple regression analysis uncover ceteris paribus effects of education on
fertility? Explain.
(i) Income, age, and family background (such as number of siblings) are just a few
possibilities. It seems that each of these could be correlated with years of education. (Income
and education are probably positively correlated; age and education may be negatively
correlated because women in more recent cohorts have, on average, more education; and
number of siblings and education are probably negatively correlated.)
(ii) Not if the factors we listed in part (i) are correlated with educ. Because we would like to
hold these factors fixed, they are part of the error term. But if u is correlated with educ then
E(u|educ) 0, and so SLR.3 fails.
--------------------------------------------------------------------------------------------------------------2.2
In the simple linear regression model y=0+1x + u, suppose that E(u) 0. Letting
0=e(u), show that the model can always be rewritten with the same slope, but new
intercept and error, where the new error has a zero expected value.
Answers
In the equation y = 0 + 1x + u, add and subtract 0 from the right hand side to get y = (0 +
0) + 1x + (u 0). Call the new error e = u 0, so that E(e) = 0. The new intercept is 0 +
0, but the slope is still 1.
Solutions, Chapter 2/HL
2.3
The following table contains the ATC scores and the GPA (grade point average) for 8
college students. Grade point average is based on a four-point scale and has been rounded
to the one digit after the decimal.
Student
1
2
3
4
5
6
7
8
GPA
2.8
3.4
3.0
3.5
3.6
3.0
2.7
3.7
ACT
21
24
26
27
39
25
25
30
(i) Estimate the relationship between GPA and ACT using ols; that is, obtain the
intercept and slope in the equation
GP A = 0 + 1 ACT
Comment on the direction of the relationship. Does the intercept have a useful
interpretation here? Explain. How much higher is GPA predicted to be if the ACT
score is increased by 5 points?
(ii) Compute the fitted valued and the residuals for each observation, and verify that
the residuals (approximately) sum to zero.
(iii) What is the predicted value of GPA when ACT =20?
(iv) How much of the variation in GPA for the 8 students is explained by ACT.
Explain.
2.3 (i) Let yi = GPAi, xi = ACTi, and n = 8. Then x = 25.875, y = 3.2125, (xi x )(yi
i=1
n
y ) = 5.8125, and (xi x )2 = 56.875. From equation (2.9), we obtain the slope as 1 =
i=1
5.8125/56.875 .1022, rounded to four places after the decimal. From (2.17), 0 = y
x 3.2125 (.1022)25.875 .5681. So we can write
1
GPA
= .5681 + .1022 ACT
n = 8.
The intercept does not have a useful interpretation because ACT is not close to zero for the
increases by .1022(5) = .511.
population of interest. If ACT is 5 points higher, GPA
Solutions, Chapter 2/HL
(ii) The fitted values and residuals rounded to four decimal places are given along
with the observation number i and GPA in the following table:
GPA
i GPA
1 2.8
2.7143 .0857
2 3.4
3.0209 .3791
3 3.0
3.2253 .2253
4 3.5
3.3275 .1725
5 3.6
3.5319 .0681
6 3.0
3.1231 .1231
7 2.7
3.1231 .4231
8 3.7
3.6341 .0659
You can verify that the residuals, as reported in the table, sum to .0002, which is pretty close
to zero given the inherent rounding error.
= .5681 + .1022(20) 2.61.
(iii) When ACT = 20, GPA
n
(iv) The sum of squared residuals,
u
i =1
and the total sum of squares,
(yi
2
i
, is about .4347 (rounded to four decimal places),
y )2, is about 1.0288. So the R-squared from the
i=1
regression is
R2 = 1 SSR/SST 1 (.4347/1.0288) .577.
Therefore, about 57.7% of the variation in GPA is explained by ACT in this small sample of
students.
--------------------------------------------------------------------------------------------------------------2.4
The data set BWGHT.DTA contains data on births to women in the United States. Two
variables of interest are the dependent variable infant birth weight on ounces (bwght), and an
explanatory variable, average number of cigarettes the mother smoked per day during
pregnancy (cigs). The following simple regression was estimated using data on n=1,388 births
bwght = 119.77 0.514cigs
(i) What is the predicted birth weight when cig s= 0? What about when cigs = 20?
Comment on the difference.
(ii) Does the simple regression necessarily capture a causal relationship between the
chids birth weight and the mothers smoking habits? Explain.
Solutions, Chapter 2/HL
(iii) The predict a birth weight of 125 ounces, what would cigs have to be? Comment.
(iv) What fraction of the women in the sample do not smoke while pregnant? Does this
help reconcile your finding from part (iii)?