Session 3 : Lecture Outline
Problem of Estimation
• Problem of Estimation:
– Ordinary Least Squares Method
– Method of Moment Estimation Procedure
– Maximum Likelihood Estimation Procedure
• Classical Linear Regression Model: Assumptions
• Precisions of the Estimator: The Standard Errors of Least Squares
Estimators
• Gauss-Markov Theorem
• Coefficient of Determination
Ref. Ch 3 Gujarati Book
Simple Linear Regression Model
Finds a linear relationship between:
- one independent variable X and
- one dependent variable Y
First prepare a scatter plot to verify the data has a linear
trend.
Use alternative approaches if the data is not linear.
Simple Linear Regression Model: Estimation
Model:
where
Y = dependent variable
X = independent variable
β1 = intercept/constant term
β2 = slope coefficient term
ui = error or random factor
Methods to Estimate SRF (estimator)
1. Least Squares Methods (Ordinary/Simple LSM)
2. Method of Moment Estimator
3. Maximum Likelihood Methods
Simple Linear Regression Model: Estimation
Yi Xi
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
Simple Linear Regression Model: Estimation
Model:
Yi = ˆ1 + ˆ2 X i + uˆi − − − SRF
Yˆi = ˆ1 + ˆ2 X i − − − − FittedLine
uˆ = (Y − ˆ − ˆ X )
i i 1 2 i
or
uˆi = Yi − Yˆi
Yi Xi
70 80 4.82 65.18
65 100 -10.36 75.36
90 120 4.45 85.55
95 140 -0.73 95.73
110 160 4.09 105.91
115 180 -1.09 116.09
120 200 -6.27 126.27
140 220 3.55 136.45
155 240 8.36 146.64
*
150 260 -6.82 156.82
Simple Linear Regression Model: Estimation
I .Using Methods of Ordinary Least Squares: OLS
We estimate the intercept and slope by minimizing the vertical distance of the
data point and the estimated sample regression function. We are minimizing
the sum of squared residual
uˆi = Yi − Yˆi = Yi − ( ˆ1 + ˆ2 X i )
𝑛 𝑛
2
𝑀𝑖𝑛 𝑢ො 𝑖 = 𝑀𝑖𝑛 ቀ𝑌𝑖 − 𝛽መ1 − 𝛽መ2 𝑋𝑖 ሻ2 ≡ 𝑀𝑖𝑛 𝑆 𝛽መ1 𝛽መ2
𝑖=1 𝛽^1 ,𝛽^2 𝑖=1 𝛽^1 ,𝛽^2
We can obtain ˆ1 , ˆby
2
taking the derivative of S ˆ1 , ˆ2 ( )
with respect to ˆ1 , ˆ order conditions), and set them equal to
(first
2
zero.
First and Second Order condition
First order condition Second order condition:
1) S ( ˆ1 , ˆ2 ) n
Mostly satisfied.
= − (Yi − ˆ1 − ˆ2 X i ) = 0
ˆ1 i =1
S ( ˆ1 , ˆ2 ) n
2) = − (Yi − ˆ1 − ˆ2 X i )( X i ) = 0
ˆ2 i =1
Two Normal Equations of OLS
Y = nˆ + ˆ X
i 1 2 i
Solving we get
Y X = ˆ X + ˆ X
2
i i 1 i 2 i
Estimation of Slope and Intercept
Further Simplifying these two normal equations together.
n
(X i − X )(Yi − Y )
x y
1) ˆ2 = i =1
or i i
Slope
x
n 2
(X
i =1
i − X )2 i Coefficients
2) ˆ1 = Y − ˆ2 X Intercept
where and are the sample means
Estimated Regression Model: Yˆi = ˆ1 + ˆ2 X i
Estimated Regression Model: *
Simple Linear Regression Model: Estimation
II. Deriving OLS Using Method of Moment(MoM)
• Another way of establishing the OLS formula is through the Method of Moments
approach, Developed by Pearson 1894
• The basic idea of this method is to equate certain sample characteristics, such as
the mean, to the corresponding population expected values.
• Method of moments estimation is based solely on the law of large numbers
The Method of Moments (MM) and GMM
1. Unconditional Moment
E(X-µ)=mean
The Method of Moments (MM) and GMM
2. Conditional Moment
Simple Linear Regression Model: Estimation(MoM)
• To derive the OLS estimates we need to realize that our main
assumption of E(u|x) = E(u) = 0 also implies that Cov(x,u) = E(xu) =
0
• We can write our 2 restrictions just in terms of x, y, β1 and β2 , since
u = y – β1 – β2x
• E(y – β1 – β2x) = 0
• E[x(y – β1 – β2x)] = 0
• These are called moment restrictions
Simple Linear Regression Model: Estimation(MoM)
• We want to choose values of the parameters that will ensure that the
sample versions of our moment restrictions are true
• The sample versions are as follows:
Given the definition of a sample mean, and properties of summation, we
can rewrite the first condition as follows
=> OLS estimated Intercept
Simple Linear Regression Model: Estimation(MoM)
=> OLS estimated slope
Statistical Properties of OLS Estimators
Estimated regression Model: Yˆi = ˆ1 + ˆ2 X i
1. The OLS estimators are expressed solely in terms of the observable quantities
2. They are point estimators
3. Once the OLS estimates are obtained, the sample regression line can be
easily obtained. This regression line has the following properties
i) It passes through sample mean of Y & X
1
ii) Yi Yˆi = ˆ1 + ˆ2 X i
iii) E (uˆi ) = 0
Y
iv)
−
Cov( 1 , 2 ) = − X var( 2 )
v) (uˆi X i ) = 0
X Xi
Some theorem
In deviation from:
SRF:
4. R2=r2
1. 0
Assumptions of CLRM
1: The model is linear in the parameters and variables.
Yi = 1 + 2 X i + u i
2: The X values are fixed in repeated sampling. X is
Nonstochastic
3: Zero mean of the disturbance u
• Given the value of X, the mean, or expected value of the
disturbance term, ui , is zero.
E ( ui | X i ) =0
Assumptions of CLRM
4: Homoscedasticity or equal
variance of ui
• Given the value of X, the
variance of, ui, the
disturbance term is the
same for all observations.
var ( ui | X i ) = E ui − E ( ui | X i )
2
= E (ui2 | X i ) uses assumption 3
= 2
Assumptions of CLRM
5: No autocorrelation between the disturbances
• Given any two X values, Xi and Xj, (i j ) , the
correlation between any two ui and uj, (i j ) , is zero.
cov ( ui u j ) = E ui − E ( ui | X i ) u j − E ( u j | X j )
= E ( ui | X i ) ( u j | X j ) uses Assumption 3
=0
Autocorrelation: Residual Plot
uˆt uˆt
.. .. . .. ..
.. . ...
.
. ...... .
... . . . . . . .. .
.
uˆt −1 . .. . uˆt −1
.
.. . . . .
. . . ..
Positive autocorrelation Negative
autocorrelation
Assumptions of CLRM
6: Zero covariance between Xi and ui or E(Xi ui )=0
cov ( ui X i ) = E ui − E ( ui | X i ) X i − E ( X i )
= E ( ui ) ( X i − E ( X i ) ) uses Assumption 3
= E ( ui X i ) − E ( ui ) E ( X i ) since E ( X i ) is nonstochastic
= E ( ui X i ) since E ( ui ) = 0
= 0 by assumption
Assumptions of CLRM
7: The number of observations (n) must be greater
than the number of parameters to be estimated
(k) (Micronumeriosity)
8: Variability in X values
• Technically Var(X) must be a finite positive number
9: The regression model is correctly specified.
There is no specification error or bias in the model
used for empirical analysis
10: There is no perfect Multicollinearity. There are
no perfect linear relationships among the
explanatory variables
Simple Linear Regression Model: Classical Assumptions
● Assumptions for the Classical Linear Regression Model:
1. The regression model is linear in the parameters
2. X values are fixed in repeated sampling
3. Zero mean value of disturbance ui
4. Homoscedasticity or equal variance of ui
5. No autocorrelation between the disturbances
6. Zero covariance between u and X
7. The number of observations n must be greater than the number of parameters to be
estimated
8. Variability in X values
9. The regression model is correctly specified
10. There is no perfect multicollinearity.
Precision or S.E. of Least Squares Estimators
Population Regression results
β1 = 17.00
β2 = 0.60
σ = 11.32
σ2 = 128.42
Sample Regression Function SRF
Precision or S.E. of Least Squares Estimators
Population Regression results
β1 = 17.00
β2 = 0.60
σ = 11.32
σ2 = 128.42
Estimated Model: Y=24.455+0.509X
Precision or S.E. of Least Squares Estimators
● The standard errors for the OLS estimates can be obtained
as follows:
Var (β^ )= 41.0881
SE (β^ 1)= 6.41
Var (β^1 )= 0.0016
SE (β^ 2)= 0.04
2
Cov (β^ ,β^ )=-170*0.0016
1 2 =272
The more variation of X, the smaller the variance of and the more precise estimate of
Variance of ̂ 2
Variation of X is relatively
( )
Var ˆ2 = n
2
(X − X)
2
Y small. Slope estimate is very i
i =1
imprecise.
. .
. ..
. .. .. . .
.
variation of X
X
Variance of ̂ 2
Variation of X is much bigger. Slope ( )
Var ˆ2 = n
2
(X − X)
Y estimate is much more precise. 2
i
. i =1
.
. . . .. . .
. . .
. . .. . . .
. . .
.
.
.
variation of X
X
Precision or S.E. of Least Squares Estimators
However, since we do not know the variance of the
error term (population variance), we can estimate
it using sample variance as follows:
̂ 2
=
ˆ
ui2
n−k
Where n-k are the number of degrees of freedom.
N.B. sample variance of the residual is the unbiased
estimate of the population variance of the error
term.
Precision or S.E. of Least Squares Estimators
• Three important elements will determine
the precision of the estimates:
1. The magnitude of the “noise”
2. The variance of X
3. The number of observations
Precision of the estimates
Yˆi = ˆ1 + ˆ2 X i
Yi
Yˆi
Residual
Yi
Slope
Intercept
Xi
X
1. Variance of ui
Y
Yˆi = ˆ1 + ˆ2 X i
The noise can be large…Or Not
X
2. Variation in X
True relationship
X
Variance in X in relation to variance in u
2. Variation in X
Y
True relationship
X
Variance in X in relation to variance in u
2.Variation in X
X
And this is your sample…
3. Number of observations
Y
True relationship
X
Variance in X in relation to variance in u
3.Number of observations
X
Precision or S.E. of Least Squares Estimators
Covariance Between slope and Intercept term.
− Y
Cov( 1 , 2 ) = − X var( 2 )
Since var ( ˆ 2 ) is always positive , the cov between the slope
and intercept is always positive and depends on sign of X .
If X is positive , then covariance will be negative.
Thus if slope coefficient is over estimated (steep) the intercept
will be under estimated (too small).
Properties of OLS Estimators
(Gauss-Markov Theorem)
Under the assumption of CLRM the least squares
estimators are the Best Linear Unbiased Estimators
(BLUE).
– Best: OLS estimate has the smallest variance
(smallest margin of error or most precise estimate)
– Linear: OLS estimate is obtained by the linear
function of Yi.
– Unbiased: Expected value of OLS estimate is the
same as true population value.
E (ˆ i ) = i
Gauss Markov Theorem
(1) Linear means linear in the dependent variable.
N
(X
i =1
i − X )(Yi − Y )
May be rewritten as
̂ 2 = N
(X
i =1
i − X )2
N
Where
( X i − X )(Yi ) N (Xi − X )
̂ 2 = i =1
N
= wY i i
wi = N
(X i − X) 2 i =1
(X
i =1
i − X )2
i =1
– 2 now is a linear function of Yi
– 1 can similarly be written as a linear function of Yi
Gauss Markov Theorem
(2) Unbiased
– The expected value of the estimator is the true
underlying parameter
E ( ˆ 2 ) = 2
(3) Efficiency : (or Minimum Variance)
~
Var ( ˆ ) Var ( 2 )
OLS
2
~
– Of all the linear, unbiased estimators, 2 OLS
has the smallest variance
Gauss Markov Theorem
(4) Consistency: An Estimator is called consistent if it converges
stochastically to the true parameter value with probability
approaching one as the sample size increased indefinitely. This
implies
p lim ˆ 2 = 2
n →
n is the number of samples
Pr{| ˆ 2 − 2 | } → 1 as n goes to infinity for small
Sufficient condition
(1) E ( ˆ 2 ) → 2
( 2)Var ( ˆ 2 ) → 0
as n goes to infinity
Similarly, we can generalise this with 1
The Overall Goodness of Fit: R2
This measure helps to determine the goodness of fit, or how well the sample
regression line fits the data
TSS= Σ(Yi - Y)2
(total variability of the Y
dependent variable about its
mean) .
.
RSS= Σ(Ŷi - Y)2
. . . . .
(variability in Y explained by
the sample regression) . . . .
. .. . .
ESS= Σ(Yi - Ŷi)2 . . . . .
(variability in Y unexplained
by the dependent variable x)
. .
This regression line gives
the minimum ESS among all
possible straight lines.
X
The Overall Goodness of Fit: R2
The Overall Goodness of Fit: r2 or R2
Decomposition of Variance of Yi
Yi = Yˆi + uˆ i
or
y i = yˆ i + uˆ i
Squaring this equation and summing over the sample, we obtain
= 2
yi ˆy i2 + yˆ uˆ
ˆu i2 + 2 i i
y = yˆ + uˆ
2
i
2
i
2
i
TSS = ESS + RSS
The Overall Goodness of Fit: r2 or R2
Then, TSS = ESS + RSS
(Yˆi − Y ) 2 i
2
ESS RSS ˆ
u
1= + = +
TSS TSS (Y i −Y ) 2
(Y
i − Y )2
we define r 2 as
r =
2 (Yˆi − Y ) 2
=
ESS
(Yi − Y ) 2 TSS
or
i
2
ˆ
u RSS
r 2 = 1− = 1−
i
(Y − Y ) 2
TSS
The coefficient of determination measures the proportion or percentage
of the total variation in Y explained by the regression model
TSS = ESS + RSS
y 2
i = r 2 yi2 + (1 − r 2 ) yi2
Problem with r2 or R2
1. Spurious regression
2. High correlation of Xt with another variable Zt.
3. Correlation does not necessarily implies Causality
4. Time series equation always generate high R2 value than cross
section equation
5. Low R2 does not means wrong choice of Xt
6. R2s from equation with different forms of Yt are not comparable.
7. R2 can be negative if the model is a bad fit or if RSS > TSS
Thanks