Solution for Midterm Exam
Econ3005, Spring 2024
Problem 1 Multiple Choice (24 points, 3 each)
Please choose the answer(s) that you think is(are) appropriate.
1.1 Interpreting the intercept in a sample regression function is
a. not reasonable because you never observe values of the explanatory variables around the origin.
b. reasonable because under certain conditions the estimator is the Best Linear Unbiased Estimator.
c. reasonable if your sample contains values of Xi around the origin.
d. not reasonable because economists are interested in the eect of a change in X on the change in
Y.
Answer: c
1.2 Finding a p-value as 0.0245
a. indicates evidence in favor of the null hypothesis
b. implies that the t-statistic is less than 1.96
c. indicates evidence against the null hypothesis
d. will only happen roughly one in twenty samples
Answer: c
1.3 The construction of the t-statistic for a one- and a two-sided hypothesis
a. depends on the critical value from the appropriate distribution
b. is the same.
c. is dierent since the critical value must be 1.645 for the one-sided hypothesis, but 1.96 for the
two-sided hypothesis (using a 5% probability for the Type I error).
d. uses for the two-sided test, but only +1.96 for the one-sided test.
Answer: b
1.4 Using the textbook example of 420 California school districts and the regression of testscores
on the student-teacher ratio, you nd that the standard error on the slope coecient is 0.51 when
using the heteroskedasticity robust formula, while it is 0.48 when employing the homoskedasticity only
formula. When calculating the t-statistic, the recommended procedure is to
a. use the homoskedasticity only formula because the t-statistic becomes larger
b. rst test for homoskedasticity of the errors and then make a decision
c. use the heteroskedasticity robust formula
d. make a decision depending on how much dierent the estimate of the slope is under the two
procedures
Answer: c
1
1.5 When there are omitted variables in the regression, which are determinants of the dependent
variable, then
a. you cannot measure the eect of the omitted variable, but the estimator of your included
variable(s) is (are) unaected.
b. this has no eect on the estimator of your included variable because the other variable is not
included.
c. this will always bias the OLS estimator of the included variable.
d. the OLS estimator is biased if the omitted variable is correlated with the included variable.
Answer: d
1.6 Using 143 observations, assume that you had estimated a simple regression function and that
your estimate for the slope was 0.65, with a standard error of 0.20. You want to test whether or not
the estimate is statistically signicant. Which of the following decisions is the only correct one:
a. you decide that the coecient is small and hence most likely is zero in the population
b. the slope is statistically signicant since it is three standard errors away from zero
c. the response of Y given a change in X must be economically important since it is statistically
signicant
d. since the slope is very small, so must be the regression R2.
Answer: b
1.7 If the estimates of the coecients of interest change substantially across specications,
a. then this can be expected from sample variation
b. then you should change the scale of the variables to make the changes appear to be smaller
c. then this often provides evidence that the original specication had omitted variable bias
d. then choose the specication for which your coecient of interest is most signicant
Answer: c
1.8 If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then
a. a series of t-tests may or may not give you the same conclusion.
b. the regression is always signicant.
c. all of the hypotheses are always simultaneously rejected.
d. the F-statistic must be negative.
Answer: a
Problem 2 Short Questions (27 points in total):
Note: for each sub-question, the answer should not be longer than 7 lines.
2.1 Data were collected from a random sample of 220 home sales from a community in 2003. Let
Price denote the selling price (in $1000), BDR denote the number of bedrooms, Bath denote the
number of bathrooms, Hsize denote the size of the house (in square feet), Lsize denote the lot size
2
(in square feet), Age denote the age of the house (in years), and P oor denote a binary variable that is
equal to 1 if the condition of the house is reported as poor. An estimated regression yelds
ˆ = 115 + 0.478BDR + 16.7Bath + 0.179Hsize + 0.004Lsize + 0.068Age − 64.2P oor R2 = 0.69
P rice
(6 points) (i) Suppose if Dr. Qin omits the variable of Hsize, which coecient(s) tend to be
aected? And how do you expect the coecient estimates will be changed? Please provide reasoning.
Answer: the coecients of BDR, Bath, and Lsize tend to be aected. As larger apartments
tend to have more bedrooms, more bathrooms and larger lot size. So the correlation between Hsize
and BDR/Bath/Lsize is positive. At the same time, larger apartments tend to have higher price,
corr(price, Hsize) > 0. The bias when omitting Hsize is positive, so we expect the three coecients
will be upward biased (more positive). (As long as you answer one of the three coecients correctly and
completely, you will get full mark.)
(2 points) (ii) Suppose that a homeowner converts part of an existing family room in her house
into a new bathroom. What is the expected increase in the value of the house?
Answer: The expected increase in the price is $16,700 (recall that the price is measured in $1000s)
(3 points) (iii) Suppose that a homeowner adds a new bathroom to her house, which increases the
size of the house by 100 square feet. What is the expected increase in the value of the house?
Answer: In this case ∆Bath = 1 and Hsize = 100, the resulting expected change in price is
(16.7 + 0.179 ∗ 100) × 1000 = $34, 600 or 34.6 thousand dollars.
(2 points) (iv) What is the loss in value if a homeowner lets his house run down so that its condition
becomes poor?
Answer: The loss is $64,200.
2.2 The cost of attending your college has once again gone up. Although you have been told that
education is investment in human capital, which carries a return of roughly 10% a year, you (and
your parents) are not pleased. One of the administrators at your university/college does not make
the situation better by telling you that you pay more because the reputation of your institution is
better than that of others. To investigate this hypothesis, you collect data randomly for 100 national
universities and liberal arts colleges from the 2000-2001 U.S. News and World Report annual rankings.
Next you perform the following regression
ˆ
Cost = 7311.17 + 3985.20 × Reputation − 0.2 × size + 8406.79 × Dpriv
(2058.63) (664.58) (0.13) (2154.85)
−416.38 × Dlibart − 2376.51 × Dreligion R2 = 0.72, SER = 2773.35
(1121.92) (1007.86)
where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index used in U.S. News
and World Report (based on a survey of university presidents and chief academic ocers), which ranges
from 1 (marginal) to 5 (distinguished), Size is the number of undergraduate students, and Dpriv ,
Dlibart, and Dreligion are binary variables indicating whether the institution is private, a liberal
arts college, and has a religious aliation. The numbers in parentheses are heteroskedasticity-robust
standard errors.
(5 points) (i). Indicate whether or not the coecients are signicantly dierent from zero.
Answer: The coecient on liberal arts colleges, is not signicantly dierent from zero. All other
3
coecients are statistically signicant at conventional levels, with the exception of the size coecient,
which carries a t-statistic of 1.54, and hence is not statistically signicant at the 5% level (using a
one-sided/two-sided alternative hypothesis).
(4 points) (ii)What is the t-statistic for the null hypothesis that the coecient on Size is equal to
zero? Based on this, should you eliminate the variable from the regression? Why or why not?
Answer: The t statistic is 1.54 and is not statistically signicant. However, variables should not
be eliminated simply on grounds of a statistical test. The sign of the coecient is as expected, and
its magnitude makes it important. It is best to leave the variable in the regression and let the reader
decide whether or not this is convincing evidence that the size of the university matters. It can also
be a variable that will cause the ommitted variable bias.
(2 points) (iii). You want to test simultaneously the hypotheses that βsize = 0 and βDlibart = 0.
Your regression package returns the F-statistic of 1.23. Can you reject the null hypothesis?
Answer: The critical value for is 3.00 (5% level) and 4.61 (1% level). Hence you cannot reject the
null hypothesis in this case.
(3 points) (iv). Eliminating the Size and Dlibart variables from your regression, the estimation
regression becomes
ˆ
Cost = 5450.35 + 3538.84 × Reputation + 10935.7 × Dpriv − 2783.31 × Dreligion
(1772.35) (590.49) (875.51) (1180.57)
R2 = 0.72, SER = 3792.68
Why do you think that the eect of attending a private institution has increased now?
Answer: There is a problem of omitted variable bias now. Private institutions are usually smaller,
on average, and some of these are liberal arts colleges. Both of these factors are determinants of the
cost (Both of these variables had negative coecients in the regression model). (It is possible for the
two variables have dierent direction of bias, but the aggregate bias is positive.) Therefore, omitting
them will cause omitted variable bias. (As long as your answer talks about omitted variable bias, it
will be corrected as right.)
Problem 3 Long Questions (49 points in total)
Note: for each sub-question, the answer should not be longer than 10 lines.
3.1 An empirical problem that has appeared frequently in the class is the relationship between
earnings and education. Below you will nd the results of estimating a simple earnings model us-
ing education (educ) and experience (exper ) as explanatory variables, both measured in years. The
dependent variable (wage) is monthly wage, measured in dollars. The estimated model is
wagei = β0 + β1 educi + β2 experi + ui (1)
where ui is the error term of the model. The estimation is based on a sample of 935 individuals.
(5 points) (i) Before estimate the model, Dr. Qin rst plots the gure of wage on educ as the
following:
4
50000
40000
30000
wage
20000
10000
10 12 14 16 18
educ
Dr.Qin further estimate equation (1), and get the regression outcome as following.
Based on the graph, does there appear to be a relationship between the variables? Please interpret
the regression outcome. Is there any particular problem with the data that might drive your regression
results?
Answer: The scatter graph does not show any relationship between the variables. The regression
outcome suggests a weak positive relationship between wage and education and also a weak negative
relationship between wage and the experience, as both of them are not statistically signicant. The
full sample includes observations whose wage are far beyond the majority. The existence of outliers
will drive the regression results imprecise. Specically, it may bias coecients towards 0.
(4 points) (ii) Dr. Qin correct the data problem and then re-run the regression (1). She obtains the
output as the following:
5
Interpret carefully the estimated coecient corresponding to educ .
Answer: Given the experience the same, an additional year of schooling increases wages in around
76 dollars per month, and the estimate is statistically signicant at 1% level.
(4 points) (iii) Evaluate the null hypothesis H0 : β2 = 0. Explain carefully and interpret your
result.
Answer: β2 is the coecient estimated for exper. The p-value is 0.000, so at a level of signicance
of 1%, we should reject the null hypothesis.
(6 points) (iv) Suppose that you learn that place of residence is an important factor, and that
people in the sample live in the south, central, or north part of a city. Propose a modication to the
model that would allow you to estimate the eect of place of residence on the wages.
Answer: It wold be a model with two dummy variables, say, one for south and another one for
central (In this situation, north part is the comparison group. You may choose your own comparison
group by yourself ). The regression model looks like the following (edu and exper are added to avoid
OVB. It is also ne if you do not include them in this question.)
wagei = β0 + β1 Southi + β2 Centrali + β3 edui + β4 exper + ui
(4 points) (v) Suppose you are only interested in the impact of education on wage, is it necessary
to include the variable exper ? Please explain.
Answer: Yes, it is necessary to include the variable of exper. It is because that the experience is
one of the determinants of the wage and is usually correlated with education: people who receive more
education are more likely to start work later and have less experience.
(6 points) (vi) A colleague makes the following comment: ' people with more education have
a tendency to have less experience, then your results are invalid since one important assumption
underlying the estimation method used is that explanatory variables must be uncorrelated'. Briey
reply to this comment.
Answer: The comment is related to the collinearity between two explanatory variables. Results
are valid uncless the correlation is perfect multicollinearity. If there is imperfect multicollinearity and
the correlation is not high, the result is valid and reliable. (Additionally, variables that are correlated
6
with the variable interest and at the same time aect the dependant variable should be included into
the regression. Otherwise, there will be omitted variable bias. The part is not a necessary part of the
answer but it is good if your answer also include this point. )
(20 points) Woodridge chapt6 question 6.1) RevisedFrom 2015-16 midterm Econ3005 2021 spring
midterm, revised data use Kielmc_2024.dta) You have the data for houses that sold during 1981 in
North Andover, Massachusetts: 1981 was the year construction began on a local garbage incinerator.
We are interested in the question whether the construction of the garbage incinerator aects the
housing price. Note: for this question, signicance level is accepted only if it is equal to or higher than
5%.
(4 points) (i) To study the eects of the incinerator location on housing price, we rst run the
simple regression model
P rice = β0 + β1 dist + u
where price is housing price in dollars and dist is distance from the house to the incinerator measured
in feet. We obtain the following output from R. Please interpret the results according to the output.
Answer: The eect of dist on price means that β1 > 0. It seems better to have a home farther away
from the incinerator. One feet increase in distance from the incinerator is associated with a predicted
price that is 1.26 dollar higher. And by the t-test, we reject that β1 = 0 at 1% signicance level.
(10 points) (ii) To the simple regression model in part (a), we add the variable intst, larea, lland,
rooms, baths, and age, where intst is distance from the home to the interstate, larea is the logarithm
of the square footage of the house, lland is the logarithm of the lot size in square feet, rooms is total
number of rooms, baths is number of bathrooms, and age is age of the house in years. The regression
outcome is given by the following. Now, what do you conclude about the eects of the incinerator?
Explain why (i) and (ii) give dierent results.
7
Answer: When the variable intst, larea, lland, rooms, baths, and age are added to the regression,
the coecient on dist becomes smaller and t-test cannot reject the coecient is equal to 0. This is
because we have explicitly controlled for several other factors that determine the quality of a home
(such as its size and number of baths) and its location (distance to the interstate). These characteristics
of housing will obviously aect the housing price and at the same time may correlated with the distance
between the house and incinerator. For example, the developer would choose less desirable location
to build houses with less attractive properties, while at the same time, the government might also
choose less desirable location to build the incinerator. Therefore, these variables are ones that will
cause bias if omitted. The result is consistent with the hypothesis that the incinerator was located
near less desirable homes to begin with.
(6 points) (iii) Now we add dist2 and dist3 to the regression, where dist2 = dist ∗ dist and
dist3 = dist ∗ dist ∗ dist. The output is list as following. What happens? After the regression, we also
do a test by R command linearHypothesis and the output is also list as following. What does this
test suggest? What do you conclude about the importance of functional form?
8
Answer: When the polynomial terms are added to the regression, the coecient on dist and its
polynomial terms are now very statistically signicant. The F-test suggests that the polynomial terms
dist2 and dist3 are not jointly zero. The estimate change and F-test both suggest that the relationship
between the distance to the incenerator and the housing price is nonlinear. Getting the functional
form right is important for us to obtain the right estimate and conclusion about the relation between
9
variable. If a relation between Y and X is nonlinear, the estimator of the eect on Y of X (based on a
linear regression) is biased.
10