Simple Linear Regression
Definition OLS Estimates Goodness of fit Units of measurement and functional form Expected values and variances of the OLS model
What is Simple Linear Regression?
X explains Y : Y=0 +1X +u u is the error term, or disturbance, or unobservable
Y Dependent variable Explained variable Response variable Predicted variable Regressand X Independent variable Explanatory variable Control variable Predictor variable Regressor
2
X has a linear effect on Y
Y=0 +1X+u Y= 1 X but only if u=0, i.e. all other factors are fixed
Example 1: yield= 0 +1fertiliser +u yield= 1 fertiliser 1measures the effect of a 1 unit change in fertiliser on yield, holding all other factors fixed , i.e. iff u=0 Example 2: wage= 0 +1educ+u wage= 1 educ 1measures the effect of a 1 unit change in years of education on hourly wage, holding all other factors fixed , i.e. iff u=0
3
Two assumptions about u
Easy: E(u)=0 : average value of u in the population is zero
can normalise all unobservables so that mean is zero (e.g. land quality, ability)
Harder: E(u|x)=E(u)=0: average value of u does not depend on value of X E(xu)=0
does E(ability|8)=E(ability|16)=E(ability)=0? Probably not: average ability among people with more education is probably higher; so need to include a measure of ability in the model NB if true then cov(x,u)=0, i.e. E(xu)=0
The OLS estimates
Y=0 +1x +u rearranging gives u=y- 0 -1x If E(u)=0 and E(xu)=0, then E(y- 0 -1x)=0 and E[x(y- 0 -1x)]=0 Choose estimates of 0 and 1 to solve these 2 equations
( x x )( y y ) = (x x)
i i 2 i
0 = y 1 x
proof not required
5
Fitted Values and Residuals
yi = 0 + 1 xi
yi
u i = y i yi
yi
6
Example 1: are CEO annual salaries related to firm performance?
y=annual salary in $000s X= average return on equity (net income as a % of equity) 1 measures change in annual salary in 000s when return on equity increases by 1 percentage point: 1>0?
salary = 0 + 1roe + u
s = 963 + 18.5701roe
If roe=0 then predicted salary is $963,000 If roe rises by 1 percentage point, roe=1 so salary=18.501, i.e. salary changes by $18,501
7
Example 2: do wages reflect education?
y=hourly wage $s X= years of schooling
wage = 0 + 1educ + u
1 measures change in hourly wage in $s when years of education increases by 1 year: 1>0?
w = 0.90 + 0.54educ
If educ=0 then predicted hourly wage is -$0.90, this has note an economical meaning If educ rises by 1 year, educ=1 so wage=0.54, i.e. wage rises by $0.54 an hour if educ rises by 4 years, educ=4 so wage=4*0.54, i.e. wage rises by $2.16 an hour
8
Example 3: is it worth spending on electoral campaigns?
y=percentage of vote received by candidate A X= percentage of total campaign expenditure on candidate A 1 measures change in percentage share of vote for A in % points when campaign spending share of A increases by 1 % point: 1>0?
voteA = 0 + 1shareA + u
v = 26.81 + 0.464 shareA
If shareA=0 then predicted share of the vote for A is 26.81% If shareA rises by 1 percentage point, shareA=1 so voteA=0.464, i.e. share of vote for A rises by 0.464 % point
9
Goodness of fit
SST = [ yi y ]2
i =1
n
total sample variation Sum Squared Total variation of fitted values Sum Squared Explained
n
SSE = [ yi y ]2
i =1
SSR = [ yi yi ]2 = ui2
i =1 i =1
variation of residuals Sum Squared of Residual
10
R2 : coefficient of determination
SST = SSE + SSR SSE SSR 1= + SST SST SSE SSR 2 = 1 R = SST SST
R2=0.0132: X explains only 1.3% of variation in Y R2=0.856: X explains 85.6% of the variation in Y 11
Changing units
E.g. change units from thousands of $ to hundreds of dollars, or from $ to OLS estimates change accordingly:
if units of dep var are multiplied or divided by a constant c, then so are 0 and 1 if units of indep var are multiplied or divided by a constant c, then so are 0 and 1
12
Changing functional form
Nonlinearities: semi-log
log(wage)= 0 + 1educ+u %wage1001educ
log y = 1x
100 log y = 100 1x %y 100 1x
percentage increase in wage for an increase of 1 year education
(log wage ) = 0.584 + 0.083educ
wage increases by 8.3 percent for every additional year of education Semi-log model allows for increasing returns to education
13
Changing functional form
Nonlinearities: double-log or constant elasticity model
log(salary)= 0 + 1log(sales)+u %salary= 1 % sales percentage increase in salary for an increase of 1% in sales
log( salary ) = 4.822 + 0.257 log( sales )
salary increases by 0.257 percent if sales rise by 1% Double-log model allows for a constant elasticty
14
Interpretation of coefficients
Model
levellevel levellog loglevel log-log
Dep Var
y y log(y) log(y)
Indep Var
x log(x) x log(x)
Interpretation of 1
y=1x y=(1/100)%x %y=(1001)x %y=1%x
15
Expected values and variances of OLS estimators
Unbiased
expected value of the distribution of OLS estimates equals the true population parameter E ( ) = and E ( ) =
0 0 1 1
bias can arise if functional form is not linear in its parameters, if sample is not random, or if there is no sample variation. Most importantly, if E(u|X) is not 0, e.g. omitted variables correlated with X
16