Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views5 pages

Key Expressions & Concepts

The document outlines key concepts in econometrics, including issues like omitted variable bias, endogeneity, and the use of multiple regression to improve causal inference. It discusses various statistical tests such as the Hausman Test and Sargan Test for assessing the validity of instruments and the assumptions underlying OLS estimators. Additionally, it highlights the importance of panel data and randomized controlled experiments in mitigating biases and ensuring valid statistical inferences.

Uploaded by

shoppiih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Key Expressions & Concepts

The document outlines key concepts in econometrics, including issues like omitted variable bias, endogeneity, and the use of multiple regression to improve causal inference. It discusses various statistical tests such as the Hausman Test and Sargan Test for assessing the validity of instruments and the assumptions underlying OLS estimators. Additionally, it highlights the importance of panel data and randomized controlled experiments in mitigating biases and ensuring valid statistical inferences.

Uploaded by

shoppiih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Key expressions & concepts

 Bad control problem


o Bias in OLS estimator because X is correlated with omitted variables Z
o Z/X2: regressors that could themselves be outcomes

 E.g earnings = α + β college degree + ui

 We add relevant regressor Z that is

 Correlated with X  OLS estimator is biased & inconsistent

 Can explain Y

 Problem: Y as well as Z can be outcomes of X

 Causal inference

o Usually we can’t give causal interpretation

o Why not?

 Omitted variables/ omitted individual characteristics that could cause Y

 Reverse causality – Y causes X

o Threatened by zero conditional mean assumption

 Central limit theroy

o When n is large, then averages of random variables are normally distributed

 Cross-sectional data

o Data on different entities (e.g. workers, consumers, firms, etc.) for a single
time period

o E.g. data on test scores in California -> data for 420 entitites (school districts)
for a single time period (1999)

 Errors-in-variables bias in the OLS estimator


o When an independent variable (X) is measured imprecisely
o This bias persists even in large sample sizes
 Error term ui
o All factors, other than Xi, that are determinants of Yi
 Endogeneity
o X is correlated with the error term (e.g. Y=income, X=education, u=skill X is
correlated with u)
 Exogeneity
o X is not correlated with the error term
o X is determined by other factors outside the model
 Hausman Test
o Test for endogeneity of regressor X
 Why? Using an instrument is only necessary if Xi is endogenous
(correlated with u)
o Test: H0: E[u|X] = 0
o Under H0, both the TSLS (FE) & OLS (RE) estimator are consistent but OLS is
more efficient
o H1: only the FE estimator is efficient
o If H-statistic > (e.g. 5%) critical value, then H0 is rejected -> X is endogenous
(correlated with u)
o If we reject H0: we prefer FE model
o
o H0: RE is appropriate
o H1: FE is appropriate
o Result: Prob>chi2 = 0  reject H0
 Homoskedasticity
o Variance - how far the points are away from the line
o Variance (ui|Xi) = constant (var doesn’t vary systematically with X)

 i.i.d. independently & identically distributed


o sample is randomly drawn from the population (independent)
o all observations in sample are drawn from same distribution (identically
distributed)
 Multicollinearity
o high intercorrelations among two or more independent variables in a
multiple regression model
o one of the 4 assumptions in multiple regression: “no perfect multicollinearity”

 if one of the regressors is a perfect linear function of the other regressors


 e.g. you want to estimate the coefficient on STR in a regression
of TestScorei on STRi and PctELi but you make a typo and
accidentally type in STRi a second time instead of PctELi -> now
you regress TestScorei on STRi and STRi  perfect
multicollinearity
 then: impossible to compute the OLS estimator
 solution: usally just modify the regressors to eliminate the problem
 Multiple regression
o A method that can eliminate omitted variable bias
 How? If we have data on the omitted variables, then we can include
them as additional regressors and thereby estimate the causal effect of
one regressor while holding constant the other variables

o Also a method to make predictions that are better than single regression by
using multiple variables as predictors
 OLS estimator
o A method to estimate the unknown parameters in a linear regression model
 Omitted variables
o Variables that are left out of the regression
 Omitted variables bias

o If the regressor (X) is correlated with a variable that has been omitted from
the analysis (variable in u) and that determines, in part, the dependent
variable (Y), then the OLS estimator will have omitted variable bias
o (1) X and u are correlated
o (2) omitted variable is also a determinant of Y
o First least squares assumption E(ui|Xi) = 0 does not hold  OLS estimator is
biased & inconsistent
o Solutions
 use of instrumental variables regressions (IV)
 panel data estimation
 use of randomized controlled experiments
 Overfitting
o When model is too complex  it begins to describe the random error in the
data rather than the relationships between variables
o Misleading R2 values, regression coefficients and p-values
 Panel Structure
o Allows us to control for unobserved heterogeneity
o Mitigate omitted variables bias
 Randomized controlled experiment

o Controlled: there are both a control group that receives no treatment and a
treatment group that receives treatment
o Randomized: the treatment is assigned randomly ; randomly pick who gets
the treatment
 Sargan Test (J-Test)
o Tests exogeneity of instruments
o If we have more instruments than regressors (m > k), then the coefficients are
overidentified
o In case of overidentification (m>k): If we want to test for instruments’ validity
(relevance & exogeneity), we can do so by using a J-Test
o H0 = instruments are exogenous
o Results of J-Test
 J-statistic > 5% critical value: reject H0  at least one instrument is
endogenous
 If: TSLS estimators are consistent & close to each other  all tested
instruments are exogenous
 If: one instrument produces very different estimates  one or both
instruments are probably not exogenous
 Two stage least squares (TSLS)
o If the instrument Z satisfies the conditions of instrument relevance and
exogeneity, the coefficient β1 can be estimated using an IV estimator (TSLS)
o (1) stage
 Decompose X into 2 components: a problematic component that may be
correlated with the regression error and another, problem-free
component that is uncorrelated with the error
o (2) stage
 Use the problem-free component to estimate β1
 Validity – internal
o Statistical inferences about causal effects are valid for the population being
studied
o Conditions
 (1) OLS estimator needs to be unbiased and consistent
 (2) Hypothesis tests should have the desired significance level,
confidence intervals should have the desired confidence intervals
(computed by the sandard error – SEs should be consistent)
 Validity – external
o Statistical inferences about causal effects can be generalized from the
population and setting studied to other populations and settings
 Weak instruments
o Instrumental variables that have a low predicitve power for the endogenous
regressor X
o Valid instruments (Z) should be
 (1) relevant – Z highly correlated with X
 (2) exogenous – Z is correlated with Y solely through its correlation with
X; so Z is uncorrelated with the error term u

o Test for instrument relevance

 Investigate the First-stage F-statistic (we want: at least one Z has


coefficient ≠ 0 in the 1st stage – then instrument not weak)
 If F > 10, then instrument is good (is relevant) (rule of thumb)

o Test for exogeneity


 Difficult to test (J-Test)
 Within estimator

o Exploits the within individual variation (over time)

You might also like