Econometrics
Econometrics
Prof. Dr. Salmai Qari
Hochschule für Wirtschaft und Recht Berlin
Summer term 2025
Qari Summer 2025
Econometrics
Econometrics?
Qari Summer 2025
Econometrics
Econometrics?
Econometric Society (established in 1930):
The Econometric Society is an international society for the advance-
ment of economic theory in its relation to statistics and mathematics. The
Society shall operate as a completely disinterested scientific organization
without political, social, financial, or nationalistic bias. Its main object
shall be to promote studies that aim at a unification of the theoretical-
quantitative and empirical-quantitative approach to economic problems
and that are penetrated by constructive and rigorous thinking similar to
that which has come to dominate in the natural sciences. Any activity
which promises ultimately to further such unification of theoretical and
factual studies in economics shall be within the sphere of interest of the
Society.
Qari Summer 2025
Econometrics
Econometrics?
Econometrics: Economics + Metrics
Econometric Society
Linking economic theory / reasoning with statistical methods
Qari Summer 2025
Econometrics
Econometrics?
Econometrics: Economics + Metrics
Econometric Society
Linking economic theory / reasoning with statistical methods
Further example: Biometrics
International Biometric Society (established in 1947)
First President of the International Biometric Society: Ronald Fisher
George W. Snedecor named the F -Test after Fisher
F -Distribution / ‘Snedecor’s F Distribution’ /
‘Fisher-Snedecor-F -Distribution’
Qari Summer 2025
Econometrics
Why choose this module?
Some examples of questions suitable for econometric analysis:
Qari Summer 2025
Econometrics
Why choose this module?
Some examples of questions suitable for econometric analysis:
What is the effect of smoking bans in restaurants
and bars on the businesses’ revenue and employment?
What is the effect of a minimum wage
on employment?
What will the unemployment rate
be next year?
What is the likelihood that a
client of a financial institution will
not be able to repay the loan?
Qari Summer 2025
Econometrics
Why choose this module?
Some examples of questions suitable for econometric analysis:
What is the effect of smoking bans in restaurants
and bars on the businesses’ revenue and employment?
Causal effect
What is the effect of a minimum wage
on employment?
What will the unemployment rate
be next year?
What is the likelihood that a Forecasting
client of a financial institution will
not be able to repay the loan?
Qari Summer 2025
Econometrics
Aim of the course
Acquire the necessary skills to understand and assess empirical studies
The skills are also very valuable for almost any policy discussion
One recent example in the area of health1 :
“Eating breakfast was associated with significantly lower CHD (Coronary
Heart Disease) risk in this cohort of male health professionals”
1 Cahill et. al., Prospective Study of Breakfast Eating and Incident Coronary Heart Disease in
a Cohort of Male US Health Professionals, Circulation. 2013; 128: 337-343. doi:
10.1161/CIRCULATIONAHA.113.001474
Qari Summer 2025
Econometrics
Aim of the course
Acquire the necessary skills to understand and assess empirical studies
The skills are also very valuable for almost any policy discussion
One recent example in the area of health1 :
“Eating breakfast was associated with significantly lower CHD (Coronary
Heart Disease) risk in this cohort of male health professionals”
There are several ‘usual’ problems in this study, in particular the difficulty to
control for unobserved factors when comparing those who have breakfast and
those who skip it
1 Cahill et. al., Prospective Study of Breakfast Eating and Incident Coronary Heart Disease in
a Cohort of Male US Health Professionals, Circulation. 2013; 128: 337-343. doi:
10.1161/CIRCULATIONAHA.113.001474
Qari Summer 2025
Econometrics
Outline
Introduction (Wooldridge Ch. 1)
The simple linear model (Wooldridge Ch. 2)
The multiple linear regression model:
Estimation and inference (Wooldridge Ch. 3/4)
Functional form (e.g. logarithmic equations, Wooldridge Ch. 2/6):
Qualitative Information (e.g. dummy variables, Wooldridge Ch. 7)
Heteroscedasticity, Specification issues (Wooldridge Ch. 8,9)
Qari Summer 2025
The Nature of Econometrics
and Economic Data
Chapter 1
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
Wooldridge (2013), Chapter 1. 1
The Nature of Econometrics
and Economic Data
What is econometrics?
Econometrics = use of statistical methods to analyze economic data
Econometricians typically analyze nonexperimental data
Typical goals of econometric analysis
Estimating relationships between economic variables
Testing economic theories and hypotheses
Forecasting economic variables
Evaluating and implementing government and business policy
Wooldridge (2013), Chapter 1. 2
The Nature of Econometrics
and Economic Data
Steps in econometric analysis
1) Economic model (this step is often skipped)
2) Econometric model
Economic models
Maybe micro- or macromodels
Often use optimizing behaviour, equilibrium modeling, …
Establish relationships between economic variables
Examples: demand equations, pricing equations, …
Wooldridge (2013), Chapter 1. 3
The Nature of Econometrics
and Economic Data
Model of job training and worker productivity
What is effect of additional training on worker productivity?
Formal economic theory not really needed to derive equation:
Hourly wage
Years of formal
education Weeks spent
Years of work- in job training
force experience
Other factors may be relevant, but these are the most important (?)
Wooldridge (2013), Chapter 1. 4
The Nature of Econometrics
and Economic Data
Econometric model of job training and worker productivity
Unobserved deter-
minants of the wage
e.g. innate ability,
Hourly wage Years of formal Years of work- Weeks spent quality of education,
education force experience in job training family background …
Most of econometrics deals with the specification of the error
Econometric models may be used for hypothesis testing
For example, the parameter represents effect of training on wage
How large is this effect? Is it different from zero?
Wooldridge (2013), Chapter 1. 5
The Nature of Econometrics
and Economic Data
Econometric analysis requires data
Different kinds of economic data sets
Cross-sectional data
Time series data
Pooled cross sections
Panel/Longitudinal data
Econometric methods depend on the nature of the data used
Use of inappropriate methods may lead to misleading results
Wooldridge (2013), Chapter 1. 6
The Nature of Econometrics
and Economic Data
Cross-sectional data sets
Sample of individuals, households, firms, cities, states, countries,
or other units of interest at a given point of time/in a given period
Cross-sectional observations are more or less independent
For example, pure random sampling from a population
Sometimes pure random sampling is violated, e.g. units refuse to
respond in surveys, or if sampling is characterized by clustering
Cross-sectional data typically encountered in applied microeconomics
Wooldridge (2013), Chapter 1. 7
The Nature of Econometrics
and Economic Data
Time series data
Observations of a variable or several variables over time
For example, stock prices, money supply, consumer price index,
gross domestic product, annual homicide rates, automobile sales, …
Time series observations are typically serially correlated
Ordering of observations conveys important information
Data frequency: daily, weekly, monthly, quarterly, annually, …
Typical features of time series: trends and seasonality
Typical applications: applied macroeconomics and finance
Wooldridge (2013), Chapter 1. 8
The Nature of Econometrics
and Economic Data
Pooled cross sections
Two or more cross sections are combined in one data set
Cross sections are drawn independently of each other
Pooled cross sections often used to evaluate policy changes
Example:
• Evaluate effect of change in property taxes on house prices
• Random sample of house prices for the year 1993
• A new random sample of house prices for the year 1995
• Compare before/after (1993: before reform, 1995: after reform)
Wooldridge (2013), Chapter 1. 9
The Nature of Econometrics
and Economic Data
Panel or longitudinal data
The same cross-sectional units are followed over time
Panel data have a cross-sectional and a time series dimension
Panel data can be used to account for time-invariant unobservables
Panel data can be used to model lagged responses
Example:
• City crime statistics; each city is observed in two years
• Time-invariant unobserved city characteristics may be modeled
• Effect of police on crime rates may exhibit time lag
Wooldridge (2013), Chapter 1. 10
The Nature of Econometrics
and Economic Data
Two-year panel data on city crime statistics
Each city has two time
series observations
Number of
police in 1986
Number of
police in 1990
Wooldridge (2013), Chapter 1. 11
The Nature of Econometrics
and Economic Data
Causality and the notion of ceteris paribus
Definition of causal effect of on :
"How does variable change if variable is changed
but all other relevant factors are held constant“
Most economic questions are ceteris paribus questions
It is important to define which causal effect one is interested in
It is useful to describe how an experiment would have to be
designed to infer the causal effect in question
Wooldridge (2013), Chapter 1. 12
The Nature of Econometrics
and Economic Data
Causal effect of fertilizer on crop yield
"By how much will the production of soybeans increase if one
increases the amount of fertilizer applied to the ground"
Implicit assumption: all other factors that influence crop yield such
as quality of land, rainfall, presence of parasites etc. are held fixed
Experiment:
Choose several one-acre plots of land; randomly assign different
amounts of fertilizer to the different plots; compare yields
Experiment works because amount of fertilizer applied is unrelated
to other factors influencing crop yields
Wooldridge (2013), Chapter 1. 13
The Nature of Econometrics
and Economic Data
Measuring the return to education
"If a person is chosen from the population and given another
year of education, by how much will his or her wage increase? "
Implicit assumption: all other factors that influence wages such as
experience, family background, intelligence etc. are held fixed
Experiment:
Choose a group of people; randomly assign different amounts of
education to them (infeasable!); compare wage outcomes
Problem without random assignment: amount of education is related
to other factors that influence wages (e.g. intelligence)
Wooldridge (2013), Chapter 1. 14
The Nature of Econometrics
and Economic Data
Effect of law enforcement on city crime level
"If a city is randomly chosen and given ten additional police officers,
by how much would its crime rate fall? "
Alternatively: "If two cities are the same in all respects, except that
city A has ten more police officers, by how much would the two cities
crime rates differ? "
Experiment:
Randomly assign number of police officers to a large number of cities
In reality, number of police officers will be determined by crime rate
(simultaneous determination of crime and number of police)
Wooldridge (2013), Chapter 1. 15
The Nature of Econometrics
and Economic Data
Effect of the minimum wage on unemployment
"By how much (if at all) will unemployment increase if the minimum
wage is increased by a certain amount (holding other things fixed)? "
Experiment:
Government randomly chooses minimum wage each year and
observes unemployment outcomes
Experiment will work because level of minimum wage is unrelated
to other factors determining unemployment
In reality, the level of the minimum wage will depend on political
and economic factors that also influence unemployment
Wooldridge (2013), Chapter 1. 16
The Simple
Regression Model
Chapter 2
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
Wooldridge (2013), Chapter 2. 1
The Simple
Regression Model
Definition of the simple linear regression model
"Explains variable in terms of variable "
Intercept Slope parameter
Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,… explanatory variable, unobservables,…
regressor,…
Wooldridge (2013), Chapter 2. 2
The Simple
Regression Model
Interpretation of the simple linear regression model
"Studies how varies with changes in :"
as long as
By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit
The simple linear regression model is rarely applicable in prac-
tice but its discussion is useful for pedagogical reasons
Wooldridge (2013), Chapter 2. 3
The Simple
Regression Model
Example: Soybean yield and fertilizer
Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed
Example: A simple wage equation
Labor force experience,
tenure with current employer,
work ethic, intelligence …
Measures the change in hourly wage
given another year of education,
holding all other factors fixed
Wooldridge (2013), Chapter 2. 4
The Simple
Regression Model
When is there a causal interpretation?
Conditional mean independence assumption
The explanatory variable must not
contain information about the mean
of the unobserved factors
Example: wage equation
e.g. intelligence …
The conditional mean independence assumption is unlikely to hold because
individuals with more education will also be more intelligent on average.
Wooldridge (2013), Chapter 2. 5
The Simple
Regression Model
Population regression function (PFR)
The conditional mean independence assumption implies that
This means that the average value of the dependent variable
can be expressed as a linear function of the explanatory variable
Wooldridge (2013), Chapter 2. 6
The Simple
Regression Model
Population regression function
For individuals with , the
average value of is
Wooldridge (2013), Chapter 2. 7
The Simple
Regression Model
In order to estimate the regression model one needs data
A random sample of observations
First observation
Second observation
Third observation Value of the dependent
variable of the i-th ob-
Value of the expla-
servation
natory variable of
the i-th observation
n-th observation
Wooldridge (2013), Chapter 2. 8
The Simple
Regression Model
Fit as good as possible a regression line through the data points:
Fitted regression line
For example, the i-th
data point
Wooldridge (2013), Chapter 2. 9
Econometrics Chapter 2
Deriving the ordinary least squares (OLS) estimates
(1) Method of moments
(2) Define sum of squared residuals as a loss / penalty function and minimize
this function
Qari Summer 2025
Econometrics Chapter 2
Deriving the ordinary least squares (OLS) estimates - I
Define a fitted value for y when x = xi as:
yˆi = β̂0 + β̂1 x1 (1)
Define a residual for the observation i as the difference between the actual y1 and
its fitted value
ûi = yi − ŷi = yi − β̂0 − β̂1 x1 . (2)
Now chose βˆ0 and βˆ1 such that the sum of squared residuals:
Xn n
X
ûi 2 = (yi − β̂0 − β̂1 x1 )2 , (3)
i=1 i=1
is as small as possible, i.e.:
n
X
min (yi − β̂0 − β̂1 x1 )2 . (4)
β̂0 ,β̂1 i=1
Qari Summer 2025
Econometrics Chapter 2
Deriving the OLS estimates - II
In order to solve this minimization problem the partial derivatives of (8) with
respect to β̂0 and β̂1 must be zero
PN N
∂ i=1 ûi2 X
= −2 (yi − β̂0 − β̂1 xi ) = 0 (5)
∂ β̂0 i=1
PN N
∂ i=1 ûi2 X
= −2 (yi − β̂0 − β̂1 xi )xi = 0 (6)
∂ β̂1 i=1
Note that (9) can be written as
ȳ = β̂0 + β̂1 x̄ (7)
PN PN
where ȳ = n−1 i=1 yi and x̄ = n−1 i=1 xi
Qari Summer 2025
Econometrics Chapter 2
Graphical illustration of the OLS estimate
Figure: OLS minimizes the sum of the squared residuals
Qari Summer 2025
The Simple
Regression Model
What does "as good as possible" mean?
Regression residuals
Minimize sum of squared regression residuals
Ordinary Least Squares (OLS) estimates
Wooldridge (2013), Chapter 2. 10
The Simple
Regression Model
CEO Salary and return on equity
Salary in thousands of dollars Return on equity of the CEO‘s firm
Fitted regression
Intercept
If the return on equity increases by 1 unit,
then salary is predicted to change by 18,501 $
Causal interpretation?
Wooldridge (2013), Chapter 2. 11
The Simple
Regression Model
Fitted regression line
(depends on sample)
Unknown population regression line
Wooldridge (2013), Chapter 2. 12
The Simple
Regression Model
Wage and education
Hourly wage in dollars Years of education
Fitted regression
Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by 0.54 $
Causal interpretation?
Wooldridge (2013), Chapter 2. 13
The Simple
Regression Model
Voting outcomes and campaign expenditures (two parties)
Percentage of vote for candidate A Percentage of campaign expenditures candidate A
Fitted regression
Intercept
If candidate A‘s share of spending increases by one
percentage point, he or she receives 0.464 percen-
Causal interpretation? tage points more of the total vote
Wooldridge (2013), Chapter 2. 14
The Simple
Regression Model
Properties of OLS on any sample of data
Fitted values and residuals
Fitted or predicted values Deviations from regression line (= residuals)
Algebraic properties of OLS regression
Deviations from regression Correlation between deviations Sample averages of y and
line sum up to zero and regressors is zero x lie on regression line
Wooldridge (2013), Chapter 2. 15
The Simple
Regression Model
For example, CEO number 12‘s salary was
526,023 $ lower than predicted using the
the information on his firm‘s return on equity
Wooldridge (2013), Chapter 2. 16
The Simple
Regression Model
Goodness-of-Fit
"How well does the explanatory variable explain the dependent variable?"
Measures of Variation
Total sum of squares, Explained sum of squares, Residual sum of squares,
represents total variation represents variation represents variation not
in dependent variable explained by regression explained by regression
Wooldridge (2013), Chapter 2. 17
The Simple
Regression Model
Decomposition of total variation
Total variation Explained part Unexplained part
Goodness-of-fit measure (R-squared)
R-squared measures the fraction of the
total variation that is explained by the
regression
Wooldridge (2013), Chapter 2. 18
Econometrics Chapter 2
Goodness of fit– -Stata
Goodness-of-fit Stataoutput
output
. reg wage school
Source | SS df MS Number of obs = 9027
-------------+------------------------------ F( 1, 9025) = 801.64
Model | 38088.4173 1 38088.4173 Prob > F = 0.0000
Residual | 428804.273 9025 47.5129389 R-squared = 0.0816
-------------+------------------------------ Adj R-squared = 0.0815
Total | 466892.69 9026 51.7275305 Root MSE = 6.893
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
school | .8112824 .0286538 28.31 0.000 .7551145 .8674503
_cons | 2.728642 .3934673 6.93 0.000 1.957357 3.499927
------------------------------------------------------------------------------
SSE 38088.4173
R2 = SST = 466892.69 = 0.0816.
SSR 428804.273
R2 = 1 − SST = 1 − 466892.69 = 0.0816.
Qari Summer 2025
Econometrics Chapter 2
Goodness of fit– -Stata
Goodness-of-fit Stataoutput
output
. reg wage school
Source | SS df MS Number of obs = 9027
-------------+------------------------------ F( 1, 9025) = 801.64
Model | 38088.4173 1 38088.4173 Prob > F = 0.0000
Residual | 428804.273 9025 47.5129389 R-squared = 0.0816
-------------+------------------------------ Adj R-squared = 0.0815
Total | 466892.69 9026 51.7275305 Root MSE = 6.893
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
school | .8112824 .0286538 28.31 0.000 .7551145 .8674503
_cons | 2.728642 .3934673 6.93 0.000 1.957357 3.499927
------------------------------------------------------------------------------
SSE
R 2R=
2 = SSE 38088.4173
SST = = 466892.69 = 0.0816
38088.4173
SST 466892.69 = 0.0816.
SSR 428804.273
R2 =2 1 − SSTSSR − 466892.69
= 1 = 0.0816
428804.273
R =1− SST =1− 466892.69 = 0.0816.
Qari Summer 2025
The Simple
Regression Model
CEO Salary and return on equity
The regression explains only 1.3 %
of the total variation in salaries
Voting outcomes and campaign expenditures
The regression explains 85.6 % of the
total variation in election outcomes
Caution: A high R-squared does not necessarily mean that the
regression has a causal interpretation!
Wooldridge (2013), Chapter 2. 19
The Simple
Regression Model
Incorporating nonlinearities: Semi-logarithmic form
Regression of log wages on years of eduction
Natural logarithm of wage
This changes the interpretation of the regression coefficient:
Percentage change of wage
… if years of education
are increased by one year
Wooldridge (2013), Chapter 2. 20
The Simple
Regression Model
Fitted regression
The wage increases by 8.3 % for
every additional year of education
(= return to education)
For example:
Growth rate of wage is 8.3 %
per year of education
Wooldridge (2013), Chapter 2. 21
The Simple
Regression Model
Incorporating nonlinearities: Log-logarithmic form
CEO salary and firm sales
Natural logarithm of CEO salary Natural logarithm of his/her firm‘s sales
This changes the interpretation of the regression coefficient:
Percentage change of salary
… if sales increase by 1 %
Logarithmic changes are
always percentage changes
Wooldridge (2013), Chapter 2. 22
The Simple
Regression Model
CEO salary and firm sales: fitted regression
For example: + 1 % sales ! + 0.257 % salary
The log-log form postulates a constant elasticity model,
whereas the semi-log form assumes a semi-elasticity model
Wooldridge (2013), Chapter 2. 23
The Simple
Regression Model
Expected values and variances of the OLS estimators
The estimated regression coefficients are random variables
because they are calculated from a random sample
Data is random and depends on particular sample that has been drawn
The question is what the estimators will estimate on average
and how large their variability in repeated samples is
Wooldridge (2013), Chapter 2. 24
The Simple
Regression Model
Standard assumptions for the linear regression model
Assumption SLR.1 (Linear in parameters)
In the population, the relationship
between y and x is linear
Assumption SLR.2 (Random sampling)
The data is a random sample
drawn from the population
Each data point therefore follows
the population equation
Wooldridge (2013), Chapter 2. 25
The Simple
Regression Model
Discussion of random sampling: Wage and education
The population consists, for example, of all workers of country A
In the population, a linear relationship between wages (or log wages)
and years of education holds
Draw completely randomly a worker from the population
The wage and the years of education of the worker drawn are random
because one does not know beforehand which worker is drawn
Throw back worker into population and repeat random draw times
The wages and years of education of the sampled workers are used to
estimate the linear relationship between wages and education
Wooldridge (2013), Chapter 2. 26
The Simple
Regression Model
The values drawn
for the i-th worker
The implied deviation
from the population
relationship for
the i-th worker:
Wooldridge (2013), Chapter 2. 27
The Simple
Regression Model
Assumptions for the linear regression model (cont.)
Assumption SLR.3 (Sample variation in explanatory variable)
The values of the explanatory variables are not all
the same (otherwise it would be impossible to stu-
dy how different values of the explanatory variable
lead to different values of the dependent variable)
Assumption SLR.4 (Zero conditional mean)
The value of the explanatory variable must
contain no information about the mean of
the unobserved factors
Wooldridge (2013), Chapter 2. 28
The Simple
Regression Model
Theorem 2.1 (Unbiasedness of OLS)
Interpretation of unbiasedness
The estimated coefficients may be smaller or larger, depending on
the sample that is the result of a random draw
However, on average, they will be equal to the values that charac-
terize the true relationship between y and x in the population
"On average" means if sampling was repeated, i.e. if drawing the
random sample und doing the estimation was repeated many times
In a given sample, estimates may differ considerably from true values
Wooldridge (2013), Chapter 2. 29
The Simple
Regression Model
Variances of the OLS estimators
Depending on the sample, the estimates will be nearer or farther
away from the true population values
How far can we expect our estimates to be away from the true
population values on average (= sampling variability)?
Sampling variability is measured by the estimator‘s variances
Assumption SLR.5 (Homoskedasticity)
The value of the explanatory variable must
contain no information about the variability
of the unobserved factors
Wooldridge (2013), Chapter 2. 30
The Simple
Regression Model
Graphical illustration of homoskedasticity
The variability of the unobserved
influences does not dependent on
the value of the explanatory variable
Wooldridge (2013), Chapter 2. 31
The Simple
Regression Model
An example for heteroskedasticity: Wage and education
The variance of the unobserved
determinants of wages increases
with the level of education
Wooldridge (2013), Chapter 2. 32
The Simple
Regression Model
Theorem 2.2 (Variances of OLS estimators)
Under assumptions SLR.1 – SLR.5:
Conclusion:
The sampling variability of the estimated regression coefficients will be
the higher the larger the variability of the unobserved factors, and the
lower, the higher the variation in the explanatory variable
Wooldridge (2013), Chapter 2. 33
The Simple
Regression Model
Estimating the error variance
The variance of u does not depend on x,
i.e. is equal to the unconditional variance
One could estimate the variance of the
errors by calculating the variance of the
residuals in the sample; unfortunately
this estimate would be biased
An unbiased estimate of the error variance can be obtained by
substracting the number of estimated regression coefficients
from the number of observations
Wooldridge (2013), Chapter 2. 34
The Simple
Regression Model
Theorem 2.3 (Unbiasedness of the error variance)
Calculation of standard errors for regression coefficients
Plug in for
the unknown
The estimated standard deviations of the regression coefficients are called "standard
errors". They measure how precisely the regression coefficients are estimated.
Wooldridge (2013), Chapter 2. 35
The Simple Regression Model (35 of 39)
• Regression on a binary explanatory variable
• Suppose that x is either equal to 0 or 1
• This regression allows the mean value of y to differ depending on the
state of x
• Note that the statistical properties of OLS are no different when x is
binary
36
The Simple Regression Model (36 of 39)
• Counterfactual outcomes, causality and policy analysis
• In policy analysis, define a treatment effect as:
• Note that we will never actually observe this since we either observe yi(1)
or yi(0) for a given i, but never both.
• Let the average treatment effect be defined as:
37
The Simple Regression Model (37 of 39)
• Counterfactual outcomes, causality and policy analysis (contd.)
• Let xi be a binary policy variable.
• This can be written as:
• Therefore, regressing y on x will give us an estimate of the (constant)
treatment effect.
• As long as we have random assignment, OLS will yield an unbiased
estimator for the treatment effect τ.
38
The Simple Regression Model (38 of 39)
• Random assignment
• Subjects are randomly assigned into treatment and control groups such that
there are no systematic differences between the two groups other than the
treatment.
• In practice, randomized control trials (RCTs) are expensive to implement and
may raise ethical issues.
• Though RCTs are often not feasible in economics, it is useful to think about
the kind of experiment you would run if random assignment was a possibility.
This helps in identifying the potential impediments to random assignment
(that we could conceivable control for in a multivariate regression).
39
The Simple Regression Model (39 of 39)
• Example: The effects of a job training program on earnings
• Real earnings are regressed on a binary variable indicating
participation in a job training program.
• Those who participated in the training program have earnings $1,790
higher than those who did not participate.
• This represents a 39.3% increase over the $4,550 average earnings
from those who did not participate.
40
Multiple Regression
Analysis: Estimation
Chapter 3
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
Wooldridge (2013), Chapter 3. 1
Multiple Regression
Analysis: Estimation
Definition of the multiple linear regression model
"Explains variable in terms of variables "
Intercept Slope parameters
Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,… explanatory variables, unobservables,…
regressors,…
Wooldridge (2013), Chapter 3. 2
Multiple Regression
Analysis: Estimation
Motivation for multiple regression
Incorporate more explanatory factors into the model
Explicitly hold fixed other factors that otherwise would be in
Allow for more flexible functional forms
Example: Wage equation
Now measures effect of education explicitly holding experience fixed
All other factors…
Hourly wage Years of education Labor market experience
Wooldridge (2013), Chapter 3. 3
Multiple Regression
Analysis: Estimation
Example: Average test scores and per student spending
Other factors
Average standardized Per student spending Average family income
test score of school at this school of students at this school
Per student spending is likely to be correlated with average family
income at a given high school because of school financing
Omitting average family income in regression would lead to biased
estimate of the effect of spending on average test scores
In a simple regression model, effect of per student spending would
partly include the effect of family income on test scores
Wooldridge (2013), Chapter 3. 4
Multiple Regression
Analysis: Estimation
Example: Family income and family consumption
Other factors
Family consumption Family income Family income squared
Model has two explanatory variables: income and income squared
Consumption is explained as a quadratic function of income
One has to be very careful when interpreting the coefficients:
By how much does consumption Depends on how
increase if income is increased much income is
by one unit? already there
Wooldridge (2013), Chapter 3. 5
Multiple Regression
Analysis: Estimation
Example: CEO salary, sales and CEO tenure
Log of CEO salary Log sales Quadratic function of CEO tenure with firm
Model assumes a constant elasticity relationship between CEO salary
and the sales of his or her firm
Model assumes a quadratic relationship between CEO salary and his
or her tenure with the firm
Meaning of "linear" regression
The model has to be linear in the parameters (not in the variables)
Wooldridge (2013), Chapter 3. 6
Multiple Regression
Analysis: Estimation
OLS Estimation of the multiple regression model
Random sample
Regression residuals
Minimize sum of squared residuals
Minimization will be carried out by computer
Wooldridge (2013), Chapter 3. 7
Multiple Regression
Analysis: Estimation
Interpretation of the multiple regression model
By how much does the dependent variable change if the j-th
independent variable is increased by one unit, holding all
other independent variables and the error term constant
The multiple linear regression model manages to hold the values
of other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration
"Ceteris paribus"-interpretation
It has still to be assumed that unobserved factors do not change if
the explanatory variables are changed
Wooldridge (2013), Chapter 3. 8
Multiple Regression
Analysis: Estimation
Example: Determinants of college GPA
Grade point average at college High school grade point average Achievement test score
Interpretation
Holding ACT fixed, another point on high school grade point average
is associated with another .453 points college grade point average
Or: If we compare two students with the same ACT, but the hsGPA of
student A is one point higher, we predict student A to have a colGPA
that is .453 higher than that of student B
Holding high school grade point average fixed, another 10 points on
ACT are associated with less than one point on college GPA
Wooldridge (2013), Chapter 3. 9
Multiple Regression
Analysis: Estimation
"Partialling out" interpretation of multiple regression
One can show that the estimated coefficient of an explanatory
variable in a multiple regression can be obtained in two steps:
1) Regress the explanatory variable on all other explanatory variables
2) Regress on the residuals from this regression
Why does this procedure work?
The residuals from the first regression is the part of the explanatory
variable that is uncorrelated with the other explanatory variables
The slope coefficient of the second regression therefore represents
the isolated effect of the explanatory variable on the dep. variable
Wooldridge (2013), Chapter 3. 10
Multiple Regression
Analysis: Estimation
Properties of OLS on any sample of data
Fitted values and residuals
Fitted or predicted values Residuals
Algebraic properties of OLS regression
Deviations from regression Correlations between deviations Sample averages of y and of the
line sum up to zero and regressors are zero regressors lie on regression line
Wooldridge (2013), Chapter 3. 11
Multiple Regression
Analysis: Estimation
Goodness-of-Fit
Decomposition of total variation
Notice that R-squared can only
increase if another explanatory
variable is added to the regression
R-squared
Alternative expression for R-squared R-squared is equal to the squared
correlation coefficient between the
actual and the predicted value of
the dependent variable
Wooldridge (2013), Chapter 3. 12
Multiple Regression
Analysis: Estimation
Example: Explaining arrest records
Number of times Proportion prior arrests Months in prison 1986 Quarters employed 1986
arrested 1986 that led to conviction
(proxy for likelihood of conv.)
Interpretation:
Proportion prior arrests +0.5 ! -.075 = -7.5 arrests per 100 men
Months in prison +12 ! -.034(12) = -0.408 arrests for given man
Quarters employed +1 ! -.104 = -10.4 arrests per 100 men
Wooldridge (2013), Chapter 3. 13
Multiple Regression
Analysis: Estimation
Example: Explaining arrest records (cont.)
An additional explanatory variable is added:
Average sentence in prior convictions
R-squared increases only slightly
Interpretation:
Average prior sentence increases number of arrests (?)
Limited additional explanatory power as R-squared increases by little
General remark on R-squared
Even if R-squared is small (as in the given example), regression may
still provide good estimates of ceteris paribus effects
Wooldridge (2013), Chapter 3. 14
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model
Assumption MLR.1 (Linear in parameters)
In the population, the relation-
ship between y and the expla-
natory variables is linear
Assumption MLR.2 (Random sampling)
The data is a random sample
drawn from the population
Each data point therefore follows the population equation
Wooldridge (2013), Chapter 3. 15
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model (cont.)
Assumption MLR.3 (No perfect collinearity)
"In the sample (and therefore in the population), none
of the independent variables is constant and there are
no exact relationships among the independent variables"
Remarks on MLR.3
The assumption only rules out perfect collinearity/correlation bet-
ween explanatory variables; imperfect correlation is allowed
If an explanatory variable is a perfect linear combination of other
explanatory variables it is superfluous and may be eliminated
Constant variables are also ruled out (collinear with intercept)
Wooldridge (2013), Chapter 3. 16
Multiple Regression
Analysis: Estimation
Example for perfect collinearity: small sample
In a small sample, avginc may accidentally be an exact multiple of expend; it will not
be possible to disentangle their separate effects because there is exact covariation
Example for perfect collinearity: relationships between regressors
Either shareA or shareB will have to be dropped from the regression because there
is an exact linear relationship between them: shareA + shareB = 1
Wooldridge (2013), Chapter 3. 17
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model (cont.)
Assumption MLR.4 (Zero conditional mean)
The value of the explanatory variables
must contain no information about the
mean of the unobserved factors
In a multiple regression model, the zero conditional mean assumption
is much more likely to hold because fewer things end up in the error
Example: Average test scores
If avginc was not included in the regression, it would end up in the error term;
it would then be hard to defend that expend is uncorrelated with the error
Wooldridge (2013), Chapter 3. 18
Multiple Regression
Analysis: Estimation
Discussion of the zero mean conditional assumption
Explanatory variables that are correlated with the error term are
called endogenous; endogeneity is a violation of assumption MLR.4
Explanatory variables that are uncorrelated with the error term are
called exogenous; MLR.4 holds if all explanat. var. are exogenous
Exogeneity is the key assumption for a causal interpretation of the
regression, and for unbiasedness of the OLS estimators
Theorem 3.1 (Unbiasedness of OLS)
Unbiasedness is an average property in repeated samples; in a given
sample, the estimates may still be far away from the true values
Wooldridge (2013), Chapter 3. 19
Multiple Regression
Analysis: Estimation
Including irrelevant variables in a regression model
No problem because . = 0 in the population
However, including irrevelant variables may increase sampling variance.
Omitting relevant variables: the simple case
True model (contains x1 and x2)
Estimated model (x2 is omitted)
Wooldridge (2013), Chapter 3. 20
Multiple Regression
Analysis: Estimation
Omitted variable bias
If x1 and x2 are correlated, assume a linear
regression relationship between them
If y is only regressed If y is only regressed error term
on x1 this will be the on x1, this will be the
estimated intercept estimated slope on x1
Conclusion: All estimated coefficients will be biased
Wooldridge (2013), Chapter 3. 21
Multiple Regression
Analysis: Estimation
Example: Omitting ability in a wage equation
Will both be positive
The return to education will be overestimated because . It will look
as if people with many years of education earn very high wages, but this is partly
due to the fact that people with more education are also more able on average.
When is there no omitted variable bias?
If the omitted variable is irrelevant or uncorrelated
Wooldridge (2013), Chapter 3. 22
Multiple Regression
Analysis: Estimation
Omitted variable bias: more general cases
True model (contains x1, x2 and x3)
Estimated model (x3 is omitted)
No general statements possible about direction of bias
Analysis as in simple case if one regressor uncorrelated with others
Example: Omitting ability in a wage equation
If exper is approximately uncorrelated with educ and abil, then the direction
of the omitted variable bias can be as analyzed in the simple two variable case.
Wooldridge (2013), Chapter 3. 23
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model (cont.)
Assumption MLR.5 (Homoscedasticity)
The value of the explanatory variables
must contain no information about the
variance of the unobserved factors
Example: Wage equation
This assumption may also be hard
to justify in many cases
Short hand notation All explanatory variables are
collected in a random vector
with
Wooldridge (2013), Chapter 3. 24
Multiple Regression
Analysis: Estimation
Theorem 3.2 (Sampling variances of OLS slope estimators)
Under assumptions MLR.1 – MLR.5:
Variance of the error term
Total sample variation in R-squared from a regression of explanatory variable
explanatory variable xj: xj on all other independent variables
(including a constant)
Wooldridge (2013), Chapter 3. 25
Multiple Regression
Analysis: Estimation
Components of OLS Variances:
1) The error variance
A high error variance increases the sampling variance because there is
more "noise" in the equation
A large error variance necessarily makes estimates imprecise
The error variance does not decrease with sample size
2) The total sample variation in the explanatory variable
More sample variation leads to more precise estimates
Total sample variation automatically increases with the sample size
Increasing the sample size is thus a way to get more precise estimates
Wooldridge (2013), Chapter 3. 26
Multiple Regression
Analysis: Estimation
3) Linear relationships among the independent variables
Regress on all other independent variables (including a constant)
The R-squared of this regression will be the higher
the better xj can be linearly explained by the other
independent variables
Sampling variance of will be the higher the better explanatory
variable can be linearly explained by other independent variables
The problem of almost linearly dependent explanatory variables is
called multicollinearity (i.e. for some )
Wooldridge (2013), Chapter 3. 27
Multiple Regression
Analysis: Estimation
An example for multicollinearity
Average standardized Expenditures Expenditures for in- Other ex-
test score of school for teachers structional materials penditures
The different expenditure categories will be strongly correlated because if a school has a lot
of resources it will spend a lot on everything.
It will be hard to estimate the differential effects of different expenditure categories because
all expenditures are either high or low. For precise estimates of the differential effects, one
would need information about situations where expenditure categories change differentially.
As a consequence, sampling variance of the estimated effects will be large.
Wooldridge (2013), Chapter 3. 28
Multiple Regression
Analysis: Estimation
Discussion of the multicollinearity problem
In the above example, it would probably be better to lump all expen-
diture categories together because effects cannot be disentangled
In other cases, dropping some independent variables may reduce
multicollinearity (but this may lead to omitted variable bias)
Only the sampling variance of the variables involved in multicollinearity
will be inflated; the estimates of other effects may be very precise
Note that multicollinearity is not a violation of MLR.3 in the strict sense
Multicollinearity may be detected through "variance inflation factors"
As an (arbitrary) rule of thumb, the variance
inflation factor should not be larger than 10
Wooldridge (2013), Chapter 3. 29
Multiple Regression
Analysis: Estimation
Variances in misspecified models
The choice of whether to include a particular variable in a regression
can be made by analyzing the tradeoff between bias and variance
True population model
Estimated model 1
Estimated model 2
It might be the case that the likely omitted variable bias in the
misspecified model 2 is overcompensated by a smaller variance
Wooldridge (2013), Chapter 3. 30
Multiple Regression
Analysis: Estimation
Variances in misspecified models (cont.)
Conditional on x1 and x2 , the
variance in model 2 is always
smaller than that in model 1
Case 1: Conclusion: Do not include irrelevant regressors
Case 2: Trade off bias and variance; Caution: bias will not vanish even in large samples
Wooldridge (2013), Chapter 3. 31
Multiple Regression
Analysis: Estimation
Estimating the error variance
An unbiased estimate of the error variance can be obtained by substracting the number of
estimated regression coefficients from the number of observations. The number of obser-
vations minus the number of estimated parameters is also called the degrees of freedom.
The n estimated squared residuals in the sum are not completely independent but related
through the k+1 equations that define the first order conditions of the minimization problem.
Theorem 3.3 (Unbiased estimator of the error variance)
Wooldridge (2013), Chapter 3. 32
Multiple Regression
Analysis: Estimation
Estimation of the sampling variances of the OLS estimators
The true sampling
variation of the
estimated
Plug in for the unknown
The estimated samp-
ling variation of the
estimated
Note that these formulas are only valid under assumptions
MLR.1-MLR.5 (in particular, there has to be homoscedasticity)
Wooldridge (2013), Chapter 3. 33
Multiple Regression
Analysis: Estimation
Efficiency of OLS: The Gauss-Markov Theorem
Under assumptions MLR.1 - MLR.5, OLS is unbiased
However, under these assumptions there may be many other
estimators that are unbiased
Which one is the unbiased estimator with the smallest variance?
In order to answer this question one usually limits oneself to linear
estimators, i.e. estimators linear in the dependent variable
May be an arbitrary function of the sample values
of all the explanatory variables; the OLS estimator
can be shown to be of this form
Wooldridge (2013), Chapter 3. 34
Multiple Regression
Analysis: Estimation
Theorem 3.4 (Gauss-Markov Theorem)
Under assumptions MLR.1 - MLR.5, the OLS estimators are the best
linear unbiased estimators (BLUEs) of the regression coefficients, i.e.
for all for which .
OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is
heteroscedasticity for example, there are better estimators.
Wooldridge (2013), Chapter 3. 35
Multiple Regression
Analysis: Inference
Chapter 4
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
Wooldridge (2013), Chapter 4. 1
Multiple Regression
Analysis: Inference
Statistical inference in the regression model
Hypothesis tests about population parameters
Construction of confidence intervals
Sampling distributions of the OLS estimators
The OLS estimators are random variables
We already know their expected values and their variances
However, for hypothesis tests we need to know their distribution
In order to derive their distribution we need additional assumptions
Assumption about distribution of errors: normal distribution
Wooldridge (2013), Chapter 4. 2
Multiple Regression
Analysis: Inference
Assumption MLR.6 (Normality of error terms)
independently of
It is assumed that the unobserved
factors are normally distributed around
the population regression function.
The form and the variance of the
distribution does not depend on
any of the explanatory variables.
It follows that:
Wooldridge (2013), Chapter 4. 3
Multiple Regression
Analysis: Inference
Discussion of the normality assumption
The error term is the sum of "many" different unobserved factors
Sums of independent factors are normally distributed (CLT)
Problems:
• How many different factors? Number large enough?
• Possibly very heterogenous distributions of individual factors
• How independent are the different factors?
The normality of the error term is an empirical question
At least the error distribution should be "close" to normal
In many cases, normality is questionable or impossible by definition
Wooldridge (2013), Chapter 4. 4
Multiple Regression
Analysis: Inference
Discussion of the normality assumption (cont.)
Examples where normality cannot hold:
• Wages (nonnegative; also: minimum wage)
• Number of arrests (takes on a small number of integer values)
• Unemployment (indicator variable, takes on only 1 or 0)
In some cases, normality can be achieved through transformations
of the dependent variable (e.g. use log(wage) instead of wage)
Under normality, OLS is the best (even nonlinear) unbiased estimator
Important: For the purposes of statistical inference, the assumption
of normality can be replaced by a large sample size
Wooldridge (2013), Chapter 4. 5
Multiple Regression
Analysis: Inference
Terminology
"Gauss-Markov assumptions" "Classical linear model (CLM) assumptions"
Theorem 4.1 (Normal sampling distributions)
Under assumptions MLR.1 – MLR.6:
The estimators are normally distributed The standardized estimators follow a
around the true parameters with the standard normal distribution
variance that was derived earlier
Wooldridge (2013), Chapter 4. 6
Multiple Regression
Analysis: Inference
Testing hypotheses about a single population parameter
Theorem 4.1 (t-distribution for standardized estimators)
Under assumptions MLR.1 – MLR.6:
If the standardization is done using the estimated
standard deviation (= standard error), the normal
distribution is replaced by a t-distribution
Note: The t-distribution is close to the standard normal distribution if n-k-1 is large.
Null hypothesis (for more general hypotheses, see below)
The population parameter is equal to zero, i.e. after
controlling for the other independent variables, there
is no effect of xj on y
Wooldridge (2013), Chapter 4. 7
Multiple Regression
Analysis: Inference
t-statistic (or t-ratio)
One should be very careful about statements like this:
„The farther the estimated coefficient is away from zero, the
less likely it is that the null hypothesis holds true.“ Further
question: what does "far" away from zero mean?
This depends on the variability of the estimated coefficient, i.e. its
standard deviation. The t-statistic measures how many estimated
standard deviations the estimated coefficient is away from zero.
Distribution of the t-statistic if the null hypothesis is true
Goal: Define a rejection rule so that, if it is true, H0 is rejected
only with a small probability (= significance level, e.g. 5%)
Wooldridge (2013), Chapter 4. 8
Multiple Regression
Analysis: Inference
Testing against one-sided alternatives (greater than zero)
Test against .
Reject the null hypothesis in favour of the
alternative hypothesis if the estimated coef-
ficient is „too large“ (i.e. larger than a criti-
cal value).
Construct the critical value so that, if the
null hypothesis is true, it is rejected in,
for example, 5% of the cases.
Example with 28 degrees of freedom:
! Reject if t-statistic greater than 1.701
Wooldridge (2013), Chapter 4. 9
Multiple Regression
Analysis: Inference
Example: Wage equation
Test whether, after controlling for education and tenure, higher work
experience leads to higher hourly wages
Standard errors
Test against .
One would either expect a positive effect of experience on hourly wage or no effect at all.
Wooldridge (2013), Chapter 4. 10
Multiple Regression
Analysis: Inference
Example: Wage equation (cont.)
t-statistic
Degrees of freedom;
here the standard normal
approximation applies
Critical values for the 5% and the 1% significance level (these
are conventional significance levels).
The null hypothesis is rejected because the t-statistic exceeds
the critical value.
The usual (problematic) parlance:
"The effect of experience on hourly wage is statistically greater than zero at the 5%
(and even at the 1%) significance level."
Wooldridge (2013), Chapter 4. 11
Multiple Regression
Analysis: Inference
Testing against one-sided alternatives (less than zero)
Test against .
Reject the null hypothesis in favour of the
alternative hypothesis if the estimated coef-
ficient is „too small“ (i.e. smaller than a criti-
cal value).
Construct the critical value so that, if the
null hypothesis is true, it is rejected in,
for example, 5% of the cases.
Example with 18 degrees of freedom:
! Reject if t-statistic less than -1.734
Wooldridge (2013), Chapter 4. 12
Multiple Regression
Analysis: Inference
Example: Student performance and school size
Test whether smaller school size leads to better student performance
Percentage of students Average annual tea- Staff per one thou- School enrollment
passing maths test cher compensation sand students (= school size)
Test against .
Do larger schools hamper student performance or is there no such effect?
Wooldridge (2013), Chapter 4. 13
Multiple Regression
Analysis: Inference
Example: Student performance and school size (cont.)
t-statistic
Degrees of freedom;
here the standard normal
approximation applies
Critical values for the 5% and the 15% significance level
(two examples).
The null hypothesis is not rejected because the t-statistic is
not smaller than the critical value.
One cannot reject the hypothesis that there is no effect of school size on student performance
(for a significance level of 15% and of course 5%).
Wooldridge (2013), Chapter 4. 14
Multiple Regression
Analysis: Inference
Testing against two-sided alternatives
Test against .
Reject the null hypothesis in favour of the
alternative hypothesis if the absolute value
of the estimated coefficient is too large.
Construct the critical value so that, if the
null hypothesis is true, it is rejected in,
for example, 5% of the cases.
Example with 25 degrees of freedom:
! Reject if absolute value of t-statistic is less than
-2.06 or greater than 2.06
Wooldridge (2013), Chapter 4. 15
Multiple Regression
Analysis: Inference
Example: Determinants of college GPA Lectures missed per week
For critical values, use standard normal distribution
Once again the usual parlance:
„The effects of hsGPA and skipped are
significantly different from zero at the
1% significance level. The effect of ACT
is not significantly different from zero,
not even at the 10% significance level.“
Wooldridge (2013), Chapter 4. 16
Multiple Regression
Analysis: Inference
"Statistically significant“ variables in a regression
If a regression coefficient is different from zero in a two-sided test, the
corresponding variable is often said to be "statistically significant“
If the number of degrees of freedom is large enough so that the nor-
mal approximation applies, the following rules of thumb apply:
"statistically significant at 10 % level“
"statistically significant at 5 % level“
"statistically significant at 1 % level“
Wooldridge (2013), Chapter 4. 17
Multiple Regression
Analysis: Inference
Economic and statistical significance
It is important to discuss the magnitude of the coefficient to get an
idea of its economic or practical importance
The fact that a coefficient is statistically significant does not necessa-
rily mean it is economically or practically significant!
If a variable is statistically and economically important but has the
"wrong“ sign, the regression model might be misspecified
Wooldridge (2013), Chapter 4. 18
Multiple Regression
Analysis: Inference
The Rhetoric of Significance Tests (in Economics and other
disciplines)
The chosen level of a test depends on the problem at hand
Given the problem at hand, the researcher / analyst has to choose
acceptable Type 1 and type 2 errors
The level is chosen before the data is analyzed / the experiment is
conducted
The usual significance levels (10%, 5%, 1%) are mere conventions
Wooldridge (2013), Chapter 4. 19
Multiple Regression
Analysis: Inference
The Rhetoric of Significance Tests (in Economics and other
disciplines)
Statements like „statistically significant at 10 % level“ are problematic;
the level of test is a property of the problem at hand and not a sample
characteristic
Unfortunately, these issues are rarely discussed (and Wooldridge
(2013) is no exception)
Further reading:
• Ioannidis JPA (2005). Why Most Published Research Findings Are False. PLoS Med 2(8): e124.
doi:10.1371/journal.pmed.0020124
• Donald N. McCloskey (1985). The Loss Function Has Been Mislaid: The Rhetoric of Significance
Tests, The American Economic Review , Vol. 75, No. 2 (P&P)
Wooldridge (2013), Chapter 4. 20
Multiple Regression
Analysis: Inference
Testing more general hypotheses about a regression coefficient
Null hypothesis
Hypothesized value of the coefficient
t-statistic
The test works exactly as before, except that the hypothesized
value is substracted from the estimate when forming the statistic
Wooldridge (2013), Chapter 4. 21
Multiple Regression
Analysis: Inference
Example: Campus crime and enrollment
An interesting hypothesis is whether crime increases by one percent
if enrollment is increased by one percent
Estimate is different from
one but what about the
precision of this
estimate?
The hypothesis is
rejected at the 5%
level
Wooldridge (2013), Chapter 4. 22
Multiple Regression
Analysis: Inference
Computing p-values for t-tests
If the significance level is made smaller and smaller, there will be a
point where the null hypothesis cannot be rejected anymore
The reason is that, by lowering the significance level, one wants to
avoid more and more to make the error of rejecting a correct H0
The smallest significance level at which the null hypothesis is still
rejected, is called the p-value of the hypothesis test
Once again be careful about statements like these:
„A small p-value is evidence against the null hypothesis because one
would reject the null hypothesis even at small significance levels“
„A large p-value is evidence in favor of the null hypothesis“
Wooldridge (2013), Chapter 4. 23
Multiple Regression
Analysis: Inference
How the p-value is computed (here: two-sided test)
The p-value is the probability of obtaining a
test statistic at least as “extreme” (pointing
towards rejection) as the one that was actually
observed
These would be the
critical values for a
In the two-sided case, the p-value is thus the
5% significance level probability that the t-distributed variable takes
on a larger absolute value than the realized
value of the test statistic, e.g.:
Hence, a null hypothesis is rejected if and only
if the corresponding p-value is smaller than
the significance level.
value of test statistic
For example, for a significance level of 5% the
t-statistic would not lie in the rejection region.
Wooldridge (2013), Chapter 4. 24
Multiple Regression
Analysis: Inference
Critical value of
Confidence intervals two-sided test
Simple manipulation of the result in Theorem 4.2 implies that
Lower bound of the Upper bound of the Confidence level
Confidence interval Confidence interval
Interpretation of the confidence interval
The bounds of the interval are random
In repeated samples, the interval that is constructed in the above way
will cover the population regression coefficient in 95% of the cases
Wooldridge (2013), Chapter 4. 25
Multiple Regression
Analysis: Inference
Confidence intervals for typical confidence levels
Use rules of thumb
Relationship between confidence intervals and hypotheses tests
reject in favor of
Wooldridge (2013), Chapter 4. 26
Multiple Regression
Analysis: Inference
Example: Model of firms‘ R&D expenditures
Spending on R&D Annual sales Profits as percentage of sales
The effect of sales on R&D is relatively precisely estimated This effect is imprecisely estimated as the in-
as the interval is narrow. Moreover, „the effect is significantly terval is very wide. „It is not statistically
different from zero“ because zero is outside the interval. significant“ because zero lies in the interval.
Wooldridge (2013), Chapter 4. 27
Multiple Regression
Analysis: Inference
Testing hypotheses about a linear combination of parameters
Example: Return to education at 2 year vs. at 4 year colleges
Years of education Years of education
at 2 year colleges at 4 year colleges
Test against .
A possible test statistic would be:
The difference between the estimates is normalized by the estimated
standard deviation of the difference. The null hypothesis would have
to be rejected if the statistic is "too negative" to believe that the true
difference between the parameters is equal to zero.
Wooldridge (2013), Chapter 4. 28
Multiple Regression
Analysis: Inference
Impossible to compute with standard regression output because
Usually not available in regression output
Alternative method
Define and test against .
Insert into original regression a new regressor (= total years of college)
Wooldridge (2013), Chapter 4. 29
Multiple Regression
Analysis: Inference
Total years of college
Estimation results
Hypothesis is rejected at 10%
level but not at 5% level
This method works always for single linear hypotheses
Wooldridge (2013), Chapter 4. 30
Multiple Regression
Analysis: Inference
Testing multiple linear restrictions: The F-test
Testing exclusion restrictions
Salary of major lea- Years in Average number of
gue base ball player the league games per year
Batting average Home runs per year Runs batted in per year
against
Test whether performance measures have no effect/can be exluded from regression.
Wooldridge (2013), Chapter 4. 31
Multiple Regression
Analysis: Inference
Estimation of the unrestricted model
None of these variabels is statistically significant when tested individually
Idea: How would the model fit be if these variables were dropped from the regression?
Wooldridge (2013), Chapter 4. 32
Multiple Regression
Analysis: Inference
Estimation of the restricted model
The sum of squared residuals necessarily increases, but is the increase „large enough“?
Test statistic Number of restrictions
The relative increase of the sum of
squared residuals when going from
H1 to H0 follows a F-distribution (if
the null hypothesis H0 is correct)
Wooldridge (2013), Chapter 4. 33
Multiple Regression
Analysis: Inference
Rejection rule (Figure 4.7)
A F-distributed variable only takes on positive
values. This corresponds to the fact that the
sum of squared residuals can only increase if
one moves from H1 to H0.
Choose the critical value so that the null hypo-
thesis is rejected in, for example, 5% of the
cases, although it is true.
Wooldridge (2013), Chapter 4. 34
Multiple Regression
Analysis: Inference
Test decision in example Number of restrictions to be tested
Degrees of freedom in
the unrestricted model
The null hypothesis is rejected
(even at very small significance
levels).
Discussion
The three variables are "jointly significant"
They were not significant when tested individually
The likely reason is multicollinearity between them
Wooldridge (2013), Chapter 4. 35
Multiple Regression
Analysis: Inference
Test of overall significance of a regression
The null hypothesis states that the explanatory
variables are not useful at all in explaining the
dependent variable
Restricted model
(regression on constant)
The test of overall significance / predictive power is reported in
most regression packages; the null hypothesis is usually rejected
Wooldridge (2013), Chapter 4. 36
Multiple Regression
Analysis: Inference
Testing general linear restrictions with the F-test
Example: Test whether house price assessments are rational
The assessed housing value Size of lot
Actual house price
(before the house was sold) (in feet)
Square footage Number of bedrooms
In addition, other known factors should
not influence the price once the assessed
value has been controlled for.
If house price assessments are rational, a 1% change in the
assessment should be associated with a 1% change in price.
Wooldridge (2013), Chapter 4. 37
Multiple Regression
Analysis: Inference
Unrestricted regression
The restricted model is actually a
Restricted regression
regression of [y-x1] on a constant
Test statistic
cannot be rejected
Wooldridge (2013), Chapter 4. 38
Multiple Regression
Analysis: Inference
Regression output for the unrestricted regression
When tested individually,
there is also no evidence
against the rationality of
house price assessments
The F-test works for general multiple linear hypotheses
For all tests and confidence intervals, validity of assumptions
MLR.1 – MLR.6 has been assumed. Tests may be invalid otherwise.
Wooldridge (2013), Chapter 4. 39
Multiple Regression
Analysis: OLS Asymptotics
Chapter 5
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
Multiple Regression
Analysis: OLS Asymptotics
So far we focused on properties of OLS that hold for any sample
Properties of OLS that hold for any sample/sample size
Expected values/unbiasedness under MLR.1 – MLR.4
Variance formulas under MLR.1 – MLR.5
Gauss-Markov Theorem under MLR.1 – MLR.5
Exact sampling distributions/tests under MLR.1 – MLR.6
Properties of OLS that hold in large samples
Without assuming nor-
Consistency under MLR.1 – MLR.4 mality of the error term!
Asymptotic normality/tests under MLR.1 – MLR.5
Wooldridge (2013), Chapter 5. 2
Multiple Regression
Analysis: OLS Asymptotics
Consistency
An estimator is consistent for a population parameter if
for arbitrary and .
Alternative notation:
The estimate converges in proba-
bility to the true population value
Interpretation:
Consistency means that the probability that the estimate is arbitrari-
ly close to the true population value can be made arbitrarily high by
increasing the sample size
Consistency is a minimum requirement for sensible estimators
Wooldridge (2013), Chapter 5. 3
Multiple Regression
Analysis: OLS Asymptotics
Theorem 5.1 (Consistency of OLS)
Special case of simple regression model
One can see that the slope estimate is consistent
Assumption MLR.4‘
if the explanatory variable is exogenous, i.e. un-
correlated with the error term.
All explanatory variables must be uncorrelated with the
error term. This assumption is weaker than the zero
conditional mean assumption MLR.4.
Wooldridge (2013), Chapter 5. 4
Multiple Regression
Analysis: OLS Asymptotics
For consistency of OLS, only the weaker MLR.4‘ is needed
Asymptotic analog of omitted variable bias
True model
Misspecified
model
Bias
There is no omitted variable bias if the omitted variable is
irrelevant or uncorrelated with the included variable
Wooldridge (2013), Chapter 5. 5
Multiple Regression
Analysis: OLS Asymptotics
Asymptotic normality and large sample inference
In practice, the normality assumption MLR.6 is often questionable
If MLR.6 does not hold, the results of t- or F-tests may be wrong
Fortunately, F- and t-tests still work if the sample size is large enough
Also, OLS estimates are normal in large samples even without MLR.6
Theorem 5.2 (Asymptotic normality of OLS)
Under assumptions MLR.1 – MLR.5:
In large samples, the
standardized estimates also
are normally distributed
Wooldridge (2013), Chapter 5. 6
Multiple Regression
Analysis: OLS Asymptotics
Practical consequences
In large samples, the t-distribution is close to the N(0,1) distribution
As a consequence, t-tests are valid in large samples without MLR.6
The same is true for confidence intervals and F-tests
Important: MLR.1 – MLR.5 are still necessary, esp. homoscedasticity
Asymptotic analysis of the OLS sampling errors
Converges to
Converges to Converges to a fixed
number
Wooldridge (2013), Chapter 5. 7
Multiple Regression
Analysis: OLS Asymptotics
Asymptotic analysis of the OLS sampling errors (cont.)
shrinks with the rate
shrinks with the rate
This is why large samples are better
Example: Standard errors in a birth weight equation
Use only the first half of observations
Wooldridge (2013), Chapter 5. 8
Multiple Regression
Analysis: Further Issues
Chapter 6
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
Multiple Regression Analysis:
Further Issues
Models with quadratics and higher polynomials
Interaction terms
Adjusted R-squared
Wooldridge (2013), Chapter 6. 2
Multiple Regression Analysis:
Further Issues
Using quadratic functional forms
Example: Wage equation Concave experience profile
The first year of experience increases
the wage by some .30$, the second
Marginal effect of experience year by .298-2(.0061)(1) = .29$ etc.
Wooldridge (2013), Chapter 6. 3
Multiple Regression Analysis:
Further Issues
Wage maximum with respect to work experience
Does this mean the return to experience
becomes negative after 24.4 years?
Not necessarily. It depends on how many
observations in the sample lie right of the
turnaround point.
In the given example, these are about 28%
of the observations. There may be a speci-
fication problem (e.g. omitted variables).
Wooldridge (2013), Chapter 6. 4
Multiple Regression Analysis:
Further Issues Nitrogen oxide in air, distance from em-
ployment centers, student/teacher ratio
Example: Effects of pollution on housing prices
Does this mean that, at a low number of rooms,
more rooms are associated with lower prices?
Wooldridge (2013), Chapter 6. 5
Multiple Regression Analysis:
Further Issues
Calculation of the turnaround point
Turnaround point:
This area can be ignored as
it concerns only 1% of the
observations.
Increase rooms from 5 to 6:
Increase rooms from 6 to 7:
Wooldridge (2013), Chapter 6. 6
Multiple Regression Analysis:
Further Issues
Other possibilities: 1)Elasticities
2) Higher polynomials
Wooldridge (2013), Chapter 6. 7
Multiple Regression Analysis:
Further Issues
Models with interaction terms
Interaction term
The effect of the number
of bedrooms depends on
the level of square footage
Interaction effects complicate interpretation of parameters
Effect of number of bedrooms, but for a square footage of zero
Wooldridge (2013), Chapter 6. 8
Multiple Regression Analysis:
Further Issues
Reparametrization of interaction effects Population means; may be
replaced by sample means
Effect of x2 if all variables take on their mean values
Advantages of reparametrization
Easy interpretation of all parameters
Standard errors for partial effects at the mean values available
If necessary, interaction may be centered at other interesting values
Wooldridge (2013), Chapter 6. 9
Multiple Regression Analysis:
Further Issues
More on goodness-of-fit and selection of regressors
General remarks on R-squared
A high R-squared does not imply that there is a causal interpretation
A low R-squared does not preclude precise estimation of partial effects
Adjusted R-squared
What is the ordinary R-squared supposed to measure?
is an estimate for
Population R-squared
Wooldridge (2013), Chapter 6. 10
Multiple Regression Analysis:
Further Issues
Correct degrees of freedom of
Adjusted R-squared (cont.) nominator and denominator
A better estimate taking into account degrees of freedom would be
The adjusted R-squared imposes a penalty for adding new regressors
The adjusted R-squared increases if, and only if, the t-statistic of a
newly added regressor is greater than one in absolute value
Relationship between R-squared and adjusted R-squared
The adjusted R-squared
may even get negative
Wooldridge (2013), Chapter 6. 11
Multiple Regression Analysis:
Further Issues
Using adjusted R-squared to choose between nonnested models
Models are nonnested if neither model is a special case of the other
A comparison between the R-squared of both models would be unfair
to the first model because the first model contains fewer parameters
In the given example, even after adjusting for the difference in
degrees of freedom, the quadratic model is preferred
Wooldridge (2013), Chapter 6. 12
Multiple Regression Analysis:
Further Issues
Comparing models with different dependent variables
R-squared or adjusted R-squared must not be used to compare models
which differ in their definition of the dependent variable
Example: CEO compensation and firm performance
There is much
less variation
in log(salary)
that needs to
be explained
than in salary
Wooldridge (2013), Chapter 6. 13
Multiple Regression Analysis:
Further Issues
Controlling for too many factors in regression analysis
In some cases, certain variables should not be held fixed
In a regression of traffic fatalities on state beer taxes (and other
factors) one should not directly control for beer consumption
In a regression of family health expenditures on pesticide usage
among farmers one should not control for doctor visits
Different regressions may serve different purposes
In a regression of house prices on house characteristics, one would
only include price assessments if the purpose of the regression is to
study their validity; otherwise one would not include them
Wooldridge (2013), Chapter 6. 14
Multiple Regression Analysis:
Further Issues
Adding regressors to reduce the error variance
Adding regressors may excarcerbate multicollinearity problems
On the other hand, adding regressors reduces the error variance
Variables that are uncorrelated with other regressors should be added
because they reduce error variance without increasing multicollinearity
However, such uncorrelated variables may be hard to find
Example: Individual beer consumption and beer prices
Including individual characteristics in a regression of beer consumption
on beer prices leads to more precise estimates of the price elasticity
Wooldridge (2013), Chapter 6. 15
Multiple Regression Analysis
with Qualitative Information
Chapter 7
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
Wooldridge (2013), Chapter 7. 1
Multiple Regression Analysis:
Qualitative Information
Qualitative Information
Examples: gender, race, industry, region, rating grade, …
A way to incorporate qualitative information is to use dummy variables
They may appear as the dependent or as independent variables
A single dummy independent variable
= the wage gain/loss if the person Dummy variable:
is a woman rather than a man =1 if the person is a woman
(holding other things fixed) =0 if the person is man
Multiple Regression Analysis:
Qualitative Information
Graphical Illustration
Alternative interpretation of coefficient:
i.e. the difference in mean wage between
men and women with the same level of
education.
Intercept shift
Multiple Regression Analysis:
Qualitative Information
This model cannot be estimated (perfect collinearity)
Dummy variable trap
When using dummy variables, one category always has to be omitted:
The base category are men
The base category are women
Alternatively, one could omit the intercept: Disadvantages:
1) More difficult to test for diffe-
rences between the parameters
2) R-squared formula only valid
if regression contains intercept
Multiple Regression Analysis:
Qualitative Information
Estimated wage equation with intercept shift
Holding education, experience,
and tenure fixed, women earn
1.81$ less per hour than men
Does that mean that women are discriminated against?
Not necessarily. Being female may be correlated with other produc-
tivity characteristics that have not been controlled for.
Multiple Regression Analysis:
Qualitative Information
Comparing means of subpopulations described by dummies
Not holding other factors constant, women
earn 2.51$ per hour less than men, i.e. the
difference between the mean wage of men
and that of women is 2.51$.
Discussion
t-ratios / tests are computed in the same way
The wage difference between men and women is larger if no other
things are controlled for; i.e. part of the difference is due to differ-
ences in education, experience and tenure between men and women
Multiple Regression Analysis:
Qualitative Information
Further example: Effects of training grants on hours of training
Hours training per employee Dummy indicating whether firm received training grant
This is an example of program evaluation
Treatment group (= grant receivers) vs. control group (= no grant)
Is the effect of treatment on the outcome of interest causal?
Multiple Regression Analysis:
Qualitative Information
Using dummy explanatory variables in equations for log(y)
Dummy indicating
whether house is of
colonial style
As the dummy for colonial
style changes from 0 to 1,
the house price increases
by 5.4 percentage points
Multiple Regression Analysis:
Qualitative Information
Using dummy variables for multiple categories
1) Define membership in each category by a dummy variable
2) Leave out one category (which becomes the base category)
Holding other things fixed, married
women earn 19.8% less than single
men (= the base category)
Multiple Regression Analysis:
Qualitative Information
Incorporating ordinal information using dummy variables
Example: City credit ratings and municipal bond interest rates
Municipal bond rate Credit rating from 0-4 (0=worst, 4=best)
This specification would probably not be appropriate as the credit rating only contains
ordinal information. A better way to incorporate this information is to define dummies:
Dummies indicating whether the particular rating applies, e.g. CR1=1 if CR=1 and CR1=0
otherwise. All effects are measured in comparison to the worst rating (= base category).
Multiple Regression Analysis:
Qualitative Information
Interactions involving dummy variables Interaction term
Allowing for different slopes
= intercept men = slope men
= intercept women = slope women
Interesting hypotheses
The return to education is the The whole wage equation is
same for men and women the same for men and women
Multiple Regression Analysis:
Qualitative Information
Graphical illustration
Interacting both the intercept and
the slope with the female dummy
enables one to model completely
independent wage equations for
men and women
Multiple Regression Analysis:
Qualitative Information
Estimated wage equation with interaction term
Does this mean that there is no significant evidence of
No evidence against hypothesis that lower pay for women at the same levels of educ, exper,
the return to education is the same and tenure? No: this is only the effect for educ = 0. To
for men and women answer the question one has to recenter the interaction
term, e.g. around educ = 12.5 (= average education).
Multiple Regression Analysis:
Qualitative Information
Testing for differences in regression functions across groups
Unrestricted model (contains full set of interactions)
College grade point average Standardized aptitude test score High school rank percentile
Total hours spent
Restricted model (same regression for both groups) in college courses
Multiple Regression Analysis:
Qualitative Information
Null hypothesis All interaction effects are zero, i.e.
the same regression coefficients
apply to men and women
Estimation of the unrestricted model
Tested individually,
the hypothesis that
the interaction effects
are zero cannot be
rejected
Multiple Regression Analysis:
Qualitative Information
Null hypothesis is rejected
Joint test with F-statistic
Alternative way to compute F-statistic in the given case
Run separate regressions for men and for women; the unrestricted
SSR is given by the sum of the SSR of these two regressions
Run regression for the restricted model and store SSR
If the test is computed in this way it is called the Chow-Test
Important: Test assumes a constant error variance accross groups
Multiple Regression Analysis:
Qualitative Information
A Binary dependent variable: the linear probability model
Linear regression when the dependent variable is binary
If the dependent variable only
takes on the values 1 and 0
Linear probability
model (LPM)
In the linear probability model, the coefficients
describe the effect of the explanatory variables
on the probability that y=1
Multiple Regression Analysis:
Qualitative Information
Example: Labor force participation of married women
=1 if in labor force, =0 otherwise Non-wife income (in thousand dollars per year)
If the number of kids under six
years increases by one, the pro-
probability that the woman
works falls by 26.2%
Large standard error (but wait …)
Multiple Regression Analysis:
Qualitative Information
Example: Female labor participation of married women (cont.)
Graph for nwifeinc=50, exper=5,
age=30, kindslt6=1, kidsge6=0
The maximum level of education in
the sample is educ=17. For the gi-
ven case, this leads to a predicted
probability to be in the labor force
of about 50%.
Negative predicted probability but
no problem because no woman in
the sample has educ < 5.
Multiple Regression Analysis:
Qualitative Information
Disadvantages of the linear probability model
Predicted probabilities may be larger than one or smaller than zero
Marginal probability effects sometimes logically impossible
The linear probability model is necessarily heteroskedastic
Variance of Ber-
noulli variable
Heterosceasticity consistent standard errors need to be computed
Advantanges of the linear probability model
Easy estimation and interpretation
Estimated effects and predictions often reasonably good in practice
Multiple Regression Analysis:
Qualitative Information
More on policy analysis and program evaluation
Example: Effect of job training grants on worker productivity
Percentage of defective items =1 if firm received training grant, =0 otherwise
No apparent effect of
grant on productivity
Treatment group: grant reveivers, Control group: firms that received no grant
Grants were given on a first-come, first-served basis. This is not the same as giving them out
randomly. It might be the case that firms with less productive workers saw an opportunity to
improve productivity and applied first.
Multiple Regression Analysis:
Qualitative Information
Self-selection into treatment as a source for endogeneity
In the given and in related examples, the treatment status is probably
related to other characteristics that also influence the outcome
The reason is that subjects self-select themselves into treatment
depending on their individual characteristics and prospects
Experimental evaluation
In experiments, assignment to treatment is random
In this case, causal effects can be inferred using a simple regression
The dummy indicating whether or not there was
treatment is unrelated to other factors affecting
the outcome.
Multiple Regression Analysis:
Qualitative Information
Further example of an endogenuous dummy regressor
Are nonwhite customers discriminated against?
Dummy indicating whether Race dummy
loan was approved Credit rating
It is important to control for other characteristics that may be
important for loan approval (e.g. profession, unemployment)
Omitting important characteristics that are correlated with the non-
white dummy will produce spurious evidence for discriminiation