Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
46 views10 pages

Midterm Solution 2024spring

The document contains a midterm exam for an economics course with multiple choice and short answer questions. It tests concepts like interpreting regression coefficients, hypothesis testing, and omitted variable bias. The exam has two main sections, with the first being multiple choice questions and the second being short answer questions related to analyzing regression outputs.

Uploaded by

bellance xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views10 pages

Midterm Solution 2024spring

The document contains a midterm exam for an economics course with multiple choice and short answer questions. It tests concepts like interpreting regression coefficients, hypothesis testing, and omitted variable bias. The exam has two main sections, with the first being multiple choice questions and the second being short answer questions related to analyzing regression outputs.

Uploaded by

bellance xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Solution for Midterm Exam

Econ3005, Spring 2024

Problem 1 Multiple Choice (24 points, 3 each)

Please choose the answer(s) that you think is(are) appropriate.

1.1 Interpreting the intercept in a sample regression function is

a. not reasonable because you never observe values of the explanatory variables around the origin.

b. reasonable because under certain conditions the estimator is the Best Linear Unbiased Estimator.

c. reasonable if your sample contains values of Xi around the origin.

d. not reasonable because economists are interested in the eect of a change in X on the change in

Y.

Answer: c

1.2 Finding a p-value as 0.0245

a. indicates evidence in favor of the null hypothesis

b. implies that the t-statistic is less than 1.96

c. indicates evidence against the null hypothesis

d. will only happen roughly one in twenty samples

Answer: c

1.3 The construction of the t-statistic for a one- and a two-sided hypothesis

a. depends on the critical value from the appropriate distribution

b. is the same.

c. is dierent since the critical value must be 1.645 for the one-sided hypothesis, but 1.96 for the

two-sided hypothesis (using a 5% probability for the Type I error).

d. uses for the two-sided test, but only +1.96 for the one-sided test.

Answer: b

1.4 Using the textbook example of 420 California school districts and the regression of testscores

on the student-teacher ratio, you nd that the standard error on the slope coecient is 0.51 when

using the heteroskedasticity robust formula, while it is 0.48 when employing the homoskedasticity only

formula. When calculating the t-statistic, the recommended procedure is to

a. use the homoskedasticity only formula because the t-statistic becomes larger

b. rst test for homoskedasticity of the errors and then make a decision

c. use the heteroskedasticity robust formula

d. make a decision depending on how much dierent the estimate of the slope is under the two

procedures

Answer: c

1
1.5 When there are omitted variables in the regression, which are determinants of the dependent

variable, then

a. you cannot measure the eect of the omitted variable, but the estimator of your included

variable(s) is (are) unaected.

b. this has no eect on the estimator of your included variable because the other variable is not

included.

c. this will always bias the OLS estimator of the included variable.

d. the OLS estimator is biased if the omitted variable is correlated with the included variable.

Answer: d

1.6 Using 143 observations, assume that you had estimated a simple regression function and that

your estimate for the slope was 0.65, with a standard error of 0.20. You want to test whether or not

the estimate is statistically signicant. Which of the following decisions is the only correct one:

a. you decide that the coecient is small and hence most likely is zero in the population

b. the slope is statistically signicant since it is three standard errors away from zero

c. the response of Y given a change in X must be economically important since it is statistically

signicant

d. since the slope is very small, so must be the regression R2.

Answer: b

1.7 If the estimates of the coecients of interest change substantially across specications,

a. then this can be expected from sample variation

b. then you should change the scale of the variables to make the changes appear to be smaller

c. then this often provides evidence that the original specication had omitted variable bias

d. then choose the specication for which your coecient of interest is most signicant

Answer: c

1.8 If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then

a. a series of t-tests may or may not give you the same conclusion.

b. the regression is always signicant.

c. all of the hypotheses are always simultaneously rejected.

d. the F-statistic must be negative.

Answer: a

Problem 2 Short Questions (27 points in total):

Note: for each sub-question, the answer should not be longer than 7 lines.

2.1 Data were collected from a random sample of 220 home sales from a community in 2003. Let

Price denote the selling price (in $1000), BDR denote the number of bedrooms, Bath denote the

number of bathrooms, Hsize denote the size of the house (in square feet), Lsize denote the lot size

2
(in square feet), Age denote the age of the house (in years), and P oor denote a binary variable that is

equal to 1 if the condition of the house is reported as poor. An estimated regression yelds

ˆ = 115 + 0.478BDR + 16.7Bath + 0.179Hsize + 0.004Lsize + 0.068Age − 64.2P oor R2 = 0.69


P rice

(6 points) (i) Suppose if Dr. Qin omits the variable of Hsize, which coecient(s) tend to be

aected? And how do you expect the coecient estimates will be changed? Please provide reasoning.

Answer: the coecients of BDR, Bath, and Lsize tend to be aected. As larger apartments

tend to have more bedrooms, more bathrooms and larger lot size. So the correlation between Hsize
and BDR/Bath/Lsize is positive. At the same time, larger apartments tend to have higher price,

corr(price, Hsize) > 0. The bias when omitting Hsize is positive, so we expect the three coecients

will be upward biased (more positive). (As long as you answer one of the three coecients correctly and

completely, you will get full mark.)

(2 points) (ii) Suppose that a homeowner converts part of an existing family room in her house

into a new bathroom. What is the expected increase in the value of the house?

Answer: The expected increase in the price is $16,700 (recall that the price is measured in $1000s)

(3 points) (iii) Suppose that a homeowner adds a new bathroom to her house, which increases the

size of the house by 100 square feet. What is the expected increase in the value of the house?

Answer: In this case ∆Bath = 1 and Hsize = 100, the resulting expected change in price is

(16.7 + 0.179 ∗ 100) × 1000 = $34, 600 or 34.6 thousand dollars.

(2 points) (iv) What is the loss in value if a homeowner lets his house run down so that its condition

becomes poor?

Answer: The loss is $64,200.

2.2 The cost of attending your college has once again gone up. Although you have been told that

education is investment in human capital, which carries a return of roughly 10% a year, you (and

your parents) are not pleased. One of the administrators at your university/college does not make

the situation better by telling you that you pay more because the reputation of your institution is

better than that of others. To investigate this hypothesis, you collect data randomly for 100 national

universities and liberal arts colleges from the 2000-2001 U.S. News and World Report annual rankings.

Next you perform the following regression


ˆ
Cost = 7311.17 + 3985.20 × Reputation − 0.2 × size + 8406.79 × Dpriv
(2058.63) (664.58) (0.13) (2154.85)
−416.38 × Dlibart − 2376.51 × Dreligion R2 = 0.72, SER = 2773.35
(1121.92) (1007.86)
where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index used in U.S. News
and World Report (based on a survey of university presidents and chief academic ocers), which ranges

from 1 (marginal) to 5 (distinguished), Size is the number of undergraduate students, and Dpriv ,
Dlibart, and Dreligion are binary variables indicating whether the institution is private, a liberal

arts college, and has a religious aliation. The numbers in parentheses are heteroskedasticity-robust

standard errors.

(5 points) (i). Indicate whether or not the coecients are signicantly dierent from zero.

Answer: The coecient on liberal arts colleges, is not signicantly dierent from zero. All other

3
coecients are statistically signicant at conventional levels, with the exception of the size coecient,

which carries a t-statistic of 1.54, and hence is not statistically signicant at the 5% level (using a

one-sided/two-sided alternative hypothesis).

(4 points) (ii)What is the t-statistic for the null hypothesis that the coecient on Size is equal to

zero? Based on this, should you eliminate the variable from the regression? Why or why not?

Answer: The t statistic is 1.54 and is not statistically signicant. However, variables should not

be eliminated simply on grounds of a statistical test. The sign of the coecient is as expected, and

its magnitude makes it important. It is best to leave the variable in the regression and let the reader

decide whether or not this is convincing evidence that the size of the university matters. It can also

be a variable that will cause the ommitted variable bias.

(2 points) (iii). You want to test simultaneously the hypotheses that βsize = 0 and βDlibart = 0.
Your regression package returns the F-statistic of 1.23. Can you reject the null hypothesis?

Answer: The critical value for is 3.00 (5% level) and 4.61 (1% level). Hence you cannot reject the

null hypothesis in this case.

(3 points) (iv). Eliminating the Size and Dlibart variables from your regression, the estimation

regression becomes
ˆ
Cost = 5450.35 + 3538.84 × Reputation + 10935.7 × Dpriv − 2783.31 × Dreligion
(1772.35) (590.49) (875.51) (1180.57)
R2 = 0.72, SER = 3792.68
Why do you think that the eect of attending a private institution has increased now?

Answer: There is a problem of omitted variable bias now. Private institutions are usually smaller,

on average, and some of these are liberal arts colleges. Both of these factors are determinants of the

cost (Both of these variables had negative coecients in the regression model). (It is possible for the

two variables have dierent direction of bias, but the aggregate bias is positive.) Therefore, omitting

them will cause omitted variable bias. (As long as your answer talks about omitted variable bias, it

will be corrected as right.)

Problem 3 Long Questions (49 points in total)

Note: for each sub-question, the answer should not be longer than 10 lines.

3.1 An empirical problem that has appeared frequently in the class is the relationship between

earnings and education. Below you will nd the results of estimating a simple earnings model us-

ing education (educ) and experience (exper ) as explanatory variables, both measured in years. The

dependent variable (wage) is monthly wage, measured in dollars. The estimated model is

wagei = β0 + β1 educi + β2 experi + ui (1)

where ui is the error term of the model. The estimation is based on a sample of 935 individuals.

(5 points) (i) Before estimate the model, Dr. Qin rst plots the gure of wage on educ as the

following:

4
50000

40000

30000

wage
20000

10000

10 12 14 16 18
educ

Dr.Qin further estimate equation (1), and get the regression outcome as following.

Based on the graph, does there appear to be a relationship between the variables? Please interpret

the regression outcome. Is there any particular problem with the data that might drive your regression

results?

Answer: The scatter graph does not show any relationship between the variables. The regression

outcome suggests a weak positive relationship between wage and education and also a weak negative

relationship between wage and the experience, as both of them are not statistically signicant. The

full sample includes observations whose wage are far beyond the majority. The existence of outliers

will drive the regression results imprecise. Specically, it may bias coecients towards 0.

(4 points) (ii) Dr. Qin correct the data problem and then re-run the regression (1). She obtains the

output as the following:

5
Interpret carefully the estimated coecient corresponding to educ .

Answer: Given the experience the same, an additional year of schooling increases wages in around

76 dollars per month, and the estimate is statistically signicant at 1% level.

(4 points) (iii) Evaluate the null hypothesis H0 : β2 = 0. Explain carefully and interpret your

result.

Answer: β2 is the coecient estimated for exper. The p-value is 0.000, so at a level of signicance

of 1%, we should reject the null hypothesis.

(6 points) (iv) Suppose that you learn that place of residence is an important factor, and that

people in the sample live in the south, central, or north part of a city. Propose a modication to the

model that would allow you to estimate the eect of place of residence on the wages.

Answer: It wold be a model with two dummy variables, say, one for south and another one for

central (In this situation, north part is the comparison group. You may choose your own comparison

group by yourself ). The regression model looks like the following (edu and exper are added to avoid

OVB. It is also ne if you do not include them in this question.)

wagei = β0 + β1 Southi + β2 Centrali + β3 edui + β4 exper + ui

(4 points) (v) Suppose you are only interested in the impact of education on wage, is it necessary

to include the variable exper ? Please explain.

Answer: Yes, it is necessary to include the variable of exper. It is because that the experience is

one of the determinants of the wage and is usually correlated with education: people who receive more

education are more likely to start work later and have less experience.

(6 points) (vi) A colleague makes the following comment: ' people with more education have

a tendency to have less experience, then your results are invalid since one important assumption

underlying the estimation method used is that explanatory variables must be uncorrelated'. Briey

reply to this comment.

Answer: The comment is related to the collinearity between two explanatory variables. Results

are valid uncless the correlation is perfect multicollinearity. If there is imperfect multicollinearity and

the correlation is not high, the result is valid and reliable. (Additionally, variables that are correlated

6
with the variable interest and at the same time aect the dependant variable should be included into

the regression. Otherwise, there will be omitted variable bias. The part is not a necessary part of the

answer but it is good if your answer also include this point. )

(20 points) Woodridge chapt6 question 6.1) RevisedFrom 2015-16 midterm Econ3005 2021 spring

midterm, revised data use Kielmc_2024.dta) You have the data for houses that sold during 1981 in

North Andover, Massachusetts: 1981 was the year construction began on a local garbage incinerator.

We are interested in the question whether the construction of the garbage incinerator aects the

housing price. Note: for this question, signicance level is accepted only if it is equal to or higher than

5%.
(4 points) (i) To study the eects of the incinerator location on housing price, we rst run the
simple regression model
P rice = β0 + β1 dist + u

where price is housing price in dollars and dist is distance from the house to the incinerator measured

in feet. We obtain the following output from R. Please interpret the results according to the output.

Answer: The eect of dist on price means that β1 > 0. It seems better to have a home farther away

from the incinerator. One feet increase in distance from the incinerator is associated with a predicted

price that is 1.26 dollar higher. And by the t-test, we reject that β1 = 0 at 1% signicance level.

(10 points) (ii) To the simple regression model in part (a), we add the variable intst, larea, lland,
rooms, baths, and age, where intst is distance from the home to the interstate, larea is the logarithm

of the square footage of the house, lland is the logarithm of the lot size in square feet, rooms is total

number of rooms, baths is number of bathrooms, and age is age of the house in years. The regression

outcome is given by the following. Now, what do you conclude about the eects of the incinerator?

Explain why (i) and (ii) give dierent results.

7
Answer: When the variable intst, larea, lland, rooms, baths, and age are added to the regression,

the coecient on dist becomes smaller and t-test cannot reject the coecient is equal to 0. This is

because we have explicitly controlled for several other factors that determine the quality of a home

(such as its size and number of baths) and its location (distance to the interstate). These characteristics

of housing will obviously aect the housing price and at the same time may correlated with the distance

between the house and incinerator. For example, the developer would choose less desirable location

to build houses with less attractive properties, while at the same time, the government might also

choose less desirable location to build the incinerator. Therefore, these variables are ones that will

cause bias if omitted. The result is consistent with the hypothesis that the incinerator was located

near less desirable homes to begin with.

(6 points) (iii) Now we add dist2 and dist3 to the regression, where dist2 = dist ∗ dist and

dist3 = dist ∗ dist ∗ dist. The output is list as following. What happens? After the regression, we also

do a test by R command linearHypothesis and the output is also list as following. What does this

test suggest? What do you conclude about the importance of functional form?

8
Answer: When the polynomial terms are added to the regression, the coecient on dist and its

polynomial terms are now very statistically signicant. The F-test suggests that the polynomial terms

dist2 and dist3 are not jointly zero. The estimate change and F-test both suggest that the relationship

between the distance to the incenerator and the housing price is nonlinear. Getting the functional

form right is important for us to obtain the right estimate and conclusion about the relation between

9
variable. If a relation between Y and X is nonlinear, the estimator of the eect on Y of X (based on a

linear regression) is biased.

10

You might also like