2021/10/10
The Simple Regression Model
y = b0 + b1x + u
Some Terminology
◼ In the simple linear regression model, where
y = b0 + b1x + u, we typically refer to y as the
❑ Dependent Variable, or
❑ Explained Variable, or
❑ Response Variable, or
❑ Predicted Variable, or
❑ Regressand
1
2021/10/10
Some Terminology, cont.
◼ In the simple linear regression of y on x, we
typically refer to x as the
❑ Independent Variable, or
❑ Explanatory Variable, or
❑ Control Variable, or
❑ Predictor Variable, or
❑ Regressor
A Simple Assumption
◼ The average value of u, the error term, in the
population is 0. That is,
◼ E(u) = 0
◼ This is not a restrictive assumption, since we
can always use b0 to normalize E(u) to 0
Intercept parameter
2
2021/10/10
Zero Conditional Mean
◼ We need to make a crucial assumption
about how u and x are related
◼ We want it to be the case that E(u) does not
depend on the value of x. That is,
◼ E(u|x) = E(u) = 0, which implies
◼ E(y|x) = b0 + b1x
Slope parameter
Zero Conditional Mean
◼ The zero conditional mean assumption E(u|x)
= 0 breaks y into two components:
◼ The piece b0 + b1x is called the systematic
part of y
◼ u is called the unsystematic part, or the part
of y not explained by x
3
2021/10/10
E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)
y
f(y)
. E(y|x) = b + b x
0 1
.
x1 x2
7
Example: Returns to Education
◼ A model of human capital investment implies getting
more education should lead to higher earnings
◼ In the simplest case, this implies an equation like
Earnings = b 0 + b1education+ u
E(u|x) = E(u) implies that E(abil|9)=E(abil|16)
4
2021/10/10
Ordinary Least Squares
◼ Basic idea of regression is to estimate the
population parameters from a sample
◼ Let {(xi,yi): i=1, …,n} denote a random sample
of size n from the population
◼ For each observation in this sample, it will be
the case that
◼ yi = b0 + b1xi + ui
Population regression line, sample data points
and the associated error terms
y E(y|x) = b0 + b1x
y4 .
u4 {
y3 .} u3
y2 u {.
2
y1 .} u1
x1 x2 x3 x4 x
10
5
2021/10/10
Deriving OLS Estimates
◼ To derive the OLS estimates we need to
realize that our main assumption of E(u|x) =
E(u) = 0 also implies that
◼ Cov(x,u) = E(xu) = 0
◼ Why? Remember from basic probability that
Cov(X,Y) = E(XY) – E(X)E(Y)
11
Deriving OLS continued
◼ We can write our 2 restrictions just in terms of
x, y, b0 and b1 , since u = y – b0 – b1x
◼ E(y – b0 – b1x) = 0
◼ E[x(y – b0 – b1x)] = 0
◼ These are called moment restrictions
12
6
2021/10/10
Deriving OLS using M.O.M.
◼ The method of moments approach to
estimation implies imposing the population
moment restrictions on the sample moments
◼ What does this mean? Recall that for E(X),
the mean of a population distribution, a
sample estimator of E(X) is simply the
arithmetic mean of the sample
13
More Derivation of OLS
◼ We want to choose values of the parameters that
will ensure that the sample versions of our moment
restrictions are true
◼ The sample versions are as follows:
(y )
n
n −1
i − bˆ0 − bˆ1 xi = 0
i =1
x (y )
n
n −1
i i − bˆ0 − bˆ1 xi = 0
i =1
14
7
2021/10/10
More Derivation of OLS
◼ Given the definition of a sample mean, and
properties of summation, we can rewrite the first
condition as follows
y = bˆ0 + bˆ1 x ,
or
bˆ0 = y − bˆ1 x
15
More Derivation of OLS
x (y − (y − bˆ x )− bˆ x ) = 0
n
i i 1 1 i
i =1
n n
xi ( yi − y ) = bˆ1 xi (xi − x )
i =1 i =1
n n
(
ix − x )( y i − y ) = ˆ (x − x )2
b 1 i
i =1 i =1
16
8
2021/10/10
So the OLS estimated slope is
(x − x )( y − y )
i i
bˆ1 = i =1
n
(
ix − x )2
i =1
n
provided that (xi − x ) 0
2
i =1
17
Summary of OLS slope estimate
◼ The slope estimate is the sample covariance
between x and y divided by the sample
variance of x
◼ If x and y are positively correlated, the slope
will be positive
◼ If x and y are negatively correlated, the slope
will be negative
◼ Only need x to vary in our sample
18
9
2021/10/10
More OLS
◼ Intuitively, OLS is fitting a line through the
sample points such that the sum of squared
residuals is as small as possible, hence the
term least squares
◼ The residual, û, is an estimate of the error
term, u, and is the difference between the
fitted line (sample regression function) and
the sample point
19
Sample regression line, sample data points
and the associated estimated error terms
y
y4 .
û4{
yˆ = bˆ0 + bˆ1 x
y3 .} û3
y2 û {.
2
y1 .} û1
x1 x2 x3 x4 x
20
10
2021/10/10
Alternate approach to derivation
◼ Given the intuitive idea of fitting a line, we can set
up a formal minimization problem
◼ That is, we want to choose our parameters such that
we minimize the following:
(uˆ ) = ( )
n n
yi − bˆ0 − bˆ1 xi
2 2
i
i =1 i =1
21
Alternate approach, continued
◼ If one uses calculus to solve the minimization
problem for the two parameters you obtain the
following first order conditions, which are the same
as we obtained before, multiplied by n
(y )
n
i − bˆ0 − bˆ1 xi = 0
i =1
x (y )
n
i i − bˆ0 − bˆ1 xi = 0
i =1
22
11
2021/10/10
Examples
◼ Example 1: CEO Salary and Return on Equity
◼ Example 2: Wage and Education
23
Algebraic Properties of OLS
◼ The sum of the OLS residuals is zero
◼ Thus, the sample average of the OLS
residuals is zero as well
◼ The sample covariance between the
regressors and the OLS residuals is zero
◼ The OLS regression line always goes
through the mean of the sample
24
12
2021/10/10
Algebraic Properties (precise)
n
n uˆ i
uˆ
i =1
i = 0 and thus, i =1
n
=0
n
x uˆ
i =1
i i =0
y = bˆ0 + bˆ1 x
25
More terminology
We can think of each observation as being made
up of an explained part, and an unexplained part,
yi = yˆ i + uˆi We then define the following :
( y − y ) is the total sum of squares (SST)
2
i
( yˆ − y ) is the explained sum of squares (SSE)
2
i
uˆ is the residual sum of squares (SSR)
2
i
Then SST = SSE + SSR
26
13
2021/10/10
Proof that SST = SSE + SSR
i( y − y )2
= (
i i i y − ˆ
y ) + ( ˆ
y − y )2
= uˆi + ( yˆ i − y )
2
= uˆi2 + 2 uˆi ( yˆ i − y ) + ( yˆ i − y )
2
= SSR + 2 uˆi ( yˆ i − y ) + SSE
and we know that uˆi ( yˆ i − y ) = 0
27
Goodness-of-Fit
◼ How do we think about how well our sample
regression line fits our sample data?
◼ Can compute the fraction of the total sum of
squares (SST) that is explained by the model,
call this the R-squared of regression
◼ R2 = SSE/SST = 1 – SSR/SST
◼ R2 is equal to the square of the sample
correlation coefficient between yi and yˆ i
28
14
2021/10/10
Goodness-of-Fit
◼ In the social science, low R2 in regression
equations are not uncommon, especially for
cross-sectional analysis
◼ A seemingly low R2 does not necessarily
mean that an OLS regression equation is
useless.
◼ An Example: CEO Salary and Return on
Equity
29
Units of Measurement and Functional
Form
The equation
salˆary = 963.191 + 18.501roe
where salary is measured in thousands of dollars.
The above equation can be written as
salaˆrdol = 963191+ 18501roe
where salardol is salary in dollars.
30
15
2021/10/10
Units of Measurement and Functional
Form (cont)
The above equation can also be written as
salˆary = 963.191 + 1850.1roedec
where roedec is the decimal equivalent of roe.
31
Units of Measurement and Functional
Form (cont)
◼ Example 1: Log Wage and Education
◼ Example 2: Log CEO Salary and Log Firm Sales
32
16
2021/10/10
Units of Measurement and Functional
Form (cont)
Model Dependent Independent Interpretation
Variable Variable of
Level-level
Level-log
Log-level
Log-log
33
Unbiasedness of OLS
◼ Assumption SLR.1: the population model is linear in
parameters as y = b0 + b1x + u
◼ Assumption SLR.2: we have a random sample of
size n, {(xi, yi): i=1, 2, …, n}, from the population
model. Thus we can write the sample model yi = b0
+ b1xi + ui
◼ Assumption SLR.3: there is variation in the xi
◼ Assumption SLR.4: E(u|x) = 0
34
17
2021/10/10
Unbiasedness of OLS (cont)
◼ In order to think about unbiasedness, we need to
rewrite our estimator in terms of the population
parameter
◼ Start with a simple rewrite of the formula as
(x − x )yi
bˆ1 = i 2 , where
sx
s x2 (xi − x )
2
35
Unbiasedness of OLS (cont)
(x − x )y = (x − x )(b + b x + u ) =
i i i 0 1 i i
(x − x )b + (x − x )b x
i 0 i 1 i
+ (x − x )u =
i i
b (x − x ) + b (x − x )x
0 i 1 i i
+ (x − x )u
i i
36
18
2021/10/10
Unbiasedness of OLS (cont)
(x − x ) = 0,
i
(x − x )x = (x − x )
2
i i i
so, the numerator can be rewritten as
b1s x2 + (xi − x )ui , and thus
ˆ (xi − x )ui
b1 = b1 +
s x2
37
Unbiasedness of OLS (cont)
let d i = (xi − x ), so that
bˆi = b1 + 1 2 d i ui , then
sx
( )
E bˆ1 = b1 + 1 2 d i E (ui ) = b1
sx
38
19
2021/10/10
Unbiasedness Summary
◼ The OLS estimates of b1 and b0 are
unbiased
◼ Proof of unbiasedness depends on our 4
assumptions – if any assumption fails, then
OLS is not necessarily unbiased
◼ Remember unbiasedness is a description of
the estimator – in a given sample we may be
“near” or “far” from the true parameter
39
Variance of the OLS Estimators
◼ Now we know that the sampling distribution of our
estimate is centered around the true parameter
◼ Want to think about how spread out this distribution
is
◼ Much easier to think about this variance under an
additional assumption, so
◼ Assumption SLR.5: Var(u|x) = s2 (Homoscedasticity)
40
20
2021/10/10
Variance of OLS (cont)
◼ Var(u|x) = E(u2|x)-[E(u|x)]2
◼ E(u|x) = 0, so s2 = E(u2|x) = E(u2) = Var(u)
◼ Thus s2 is also the unconditional variance,
called the error variance
◼ s, the square root of the error variance is
called the standard deviation of the error
◼ Can say: E(y|x)=b0 + b1x and Var(y|x) = s2
41
Homoscedastic Case
y
f(y|x)
. E(y|x) = b + b x
0 1
.
x1 x2
42
21
2021/10/10
Heteroscedastic Case
f(y|x)
.
. E(y|x) = b0 + b1x
.
x1 x2 x3 x
43
Variance of OLS (cont)
( )
Var bˆ1 = Var b1 + 1 2 d i ui =
sx
2 2
2 Var ( d i ui ) =
1 1
sx
2
sx
d Var (u )
i
2
i
2 2
= 1 2
sx
d s = s 1 sx2
i
2 2 2
d i
2
=
( )
2
s 1 2 s x2 = s 2 = Var bˆ1
2 2
sx sx
44
22
2021/10/10
Variance of OLS Summary
◼ The larger the error variance, s2, the larger
the variance of the slope estimate
◼ The larger the variability in the xi, the smaller
the variance of the slope estimate
◼ As a result, a larger sample size should
decrease the variance of the slope estimate
◼ Problem that the error variance is unknown
45
Estimating the Error Variance
◼ We don’t know what the error variance, s2, is,
because we don’t observe the errors, ui
◼ What we observe are the residuals, ûi
◼ We can use the residuals to form an estimate
of the error variance
46
23
2021/10/10
Error Variance Estimate (cont)
uˆi = yi − bˆ0 − bˆ1 xi
= (b 0 + b1 xi + ui ) − bˆ0 − bˆ1 xi
i (
= u − bˆ − b − bˆ − b
0 0 )( 1 1 )
Then, an unbiased estimator of s 2 is
uˆi2 = SSR / (n − 2)
1
sˆ 2 =
(n − 2)
47
Error Variance Estimate (cont)
sˆ = sˆ 2 = Standard error of the regression
()
recall that sd bˆ = s
sx
if we substitute sˆ for s then we have
the standard error of bˆ1 ,
( ) (
se bˆ1 = sˆ / (xi − x )
2
) 1
2
48
24