Econometrics
University of Milan-Bicocca
Course lecturer:
Maryam Ahmadi
[email protected]
1
Course evaluation
(20%) In-class activity
Including your presence and homework
(30%) Take-home exam
In the take-home exam, you will be asked to analyze datasets using Stata and to write up the empirical analysis.
Materials covered in this exam are the same we covered in the lectures.
The take-home exam is to be completed individually.
(50%) In-class written exam
At the end of the semester.
• The share of the three parts might change, according to the special situation this semester.
2
Linear Regression Model
Sampling Distributions of the OLS
Estimators
3
Problem & Answer. 7.
True or False? Explain.
a) The covariance between residuals from OLS regression and explanatory variable is zero. Cov(x,𝑢)=0
ො
A part of y that is explained by x
𝑢ො = y −𝑦ො y= 𝑦ො + 𝑢ො A part of y that is not explained by x
Therefore there is no relationship between residuals and explanatory variables.
a) If a variable in a model is significant at the 10% level, it is also significant at the 5% level.
This statement is false. Actually, it works the other way around. If a variable
is significant at the 5% level, it is also significant at the 10% level.
a) The hypothesis that the OLS estimator is equal to zero can be tested by means of a t-test.
We never test hypotheses about the estimators that we use. Estimators are random variables and could take
many different values depending upon the sample that we have. Instead, we always test hypotheses about the
true but unknown coeffcients. The statement should therefore be formulated as: The hypothesis that a (beta)
coefficient in the model is equal to zero can be tested by means of a t-test.
4
a) If the absolute t-ratio of a coefficient is smaller than the critical value, we accept the null
hypothesis that the coefficient is zero.
We reject the null hypothesis (at 1% or 5% or 10%confidence level) or We do not reject the null hypothesis. We
typically do not say: “We accept the null hypothesis”.
Why? Two mutually exclusive hypotheses (e.g. β2 = 0 and β2 = 0.01) may not be rejected by the data, but it is
silly to accept both hypotheses.
a) Because OLS provides the best linear approximation of a variable y from a set of regressors, OLS
also gives best linear unbiased estimators for the coefficients of these regressors.
Best, in the two part of the statement, refers to different thing: in the first part of the refers to the fact that
OLS estimation line has the minimum estimation error. In the second part refers to the fact that under A1-A4
OLS estimators has the minimum variance.
a) Estimators cannot be BLUE unless the error terms are all normally distributed.
Estimators are BLUE when assumptions A1-A4 hold.
Normality of the error terms is the assumption A5, which we make it to construct out test statistic for the
hypothesis testing about the true and unknown value of the model parameters.
5
p-values for t-tests
• If the significance level is made smaller and smaller, there will be a point where the null
hypothesis cannot be rejected anymore
• The smallest significance level at which the null hypothesis
is still rejected, is called the p-value of the hypothesis test
• The p-value summarizes the strength and weakness of the empirical evidence against the null
• A small p-value is evidence against the null hypothesis because one would reject the null
hypothesis even at small significance levels
• A large p-value is evidence in favor of the null hypothesis
6
In summary
❖p-value ≤ 0.01, the null is rejected at the 1% significance level.
❖p-value≤0.05, the null is rejected at the 5% significance level
❖p-value ≤ 0.1, the null is rejected at the 10% significance level
❖P-value>0.1, we fail to reject the null even at the 10% significant level
7
For example,
• a p-value of 0.08 indicates that the null is rejected at the 10% level (0.08 < 0.1)
but not rejected at the 5% level (0.08 ≮ 0.05).
• a p-value of 0.02 indicates that the null is rejected at the 5% level (also at
10%) but not at the 1% level.
Many modern software packages provide p-values with their tests. This allows
you to perform the test without checking tables of critical values.
Example in stata using the data wage1.
8
Confidence Interval (CI)
A confidence interval is an interval estimate for 𝛽𝑗 , and can be calculated as
𝛽መ𝑗 ± 𝑐 ∙ 𝑠𝑒(𝛽መ𝑗 )
Where c is the critical value for the confidence level from
the tn-k-1 distribution.
–c < t-statictic < c
We do not reject H0
• Consequently, using a 𝑡120 distribution, a 95% confidence interval for 𝛽𝑗 is:
[ 𝛽መj −1.96se(𝛽መj ), 𝛽መ𝑗 +1.96se(𝛽መj ) ]
9
Meaning of CI:
• If random samples were obtained over and over again (in a repeated
sampling), we expect that 95% of these intervals will contain the true
value of 𝛽𝑗 .
which is a fixed but
unknown number (and
thus not stochastic).
• For the single sample that we use to construct the CI, we can say with 95%
of confidence that, 𝛽𝑗 is contained in the interval that we obtained based
on our sample.
10
This means that with 95% confidence, we expect that, over the entire population,
the return to education is between 7.8% and 10.6% of the individual’s wage.
11
• CI and hypothesis testing
CI and two-sided hypothesis testing are corresponding. With the hypothesis
𝐻0 : 𝛽j = 𝑎𝑗
𝐻1 : 𝛽j ≠ 𝑎𝑗
• The null is rejected at 5% level if and only if 𝑎𝑗 is NOT in the 95% CI.
• The null is rejected at 1% level if and only if 𝑎𝑗 is NOT in the 99% CI.
12
Problem 8
Consider the model that relates the lwage of individuals to their education, gender and years of
experience.
lwage = 𝛽0 + 𝛽1 𝑒𝑑𝑢 + 𝛽2 𝑚𝑎𝑙𝑒 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝑢
The estimation results are shown in the output table below:
Sum of
squared
(𝑦ො𝑖 − 𝑦)
ത
(𝑦𝑖 − 𝑦ො𝑖 )
(𝑦𝑖 − 𝑦)
ത
13
a) Fill in the blank (1) to (4) in the output table. (Write how you calculate
those numbers)
b) Compute the estimated error variance 𝜎ො 2 ?
c) Test the hypothesis that lwage in the population does not depend on
gender of the workers.
d) How confidence interval reported in the table confirms the result of your
test.
14