Chapter 2
Chapter 2
In this chapter we will give emphasis to linear regression models and the
application of ordinary least squares (OLS) method to obtain the estimates of the
parameters of the true economic relationships. This means, in this chapter we
will learn how to develop formula for the estimates of the parameters by using the
method of OLS. Finally, the chapter will emphasize the different ways of
developing econometric models based on economic theories.
It is evident from the above table that there are seven fixed values of Y and the
corresponding C values against each of the fixed Y values. As shown in table, for
fixed values of monthly income there can be different values of monthly
consumption expenditure. Succinctly, having the same monthly income it’s
possible for families to have different consumption expenditures. Hence, when
12
we are taking repeated samples, it is possible that we can generate different
samples of the same size with the same data for monthly income but different
values for consumption expenditure; this is the essence of fixed values for the
explanatory variables in repeated sampling*.
Here, it is worthy to note that these average values of C are conditional on the
*
Repeated sampling is only hypothetical; in practice we take only one sample and base our regression on
this observed sample.
13
regression analysis deals with the dependence of one variable on other
variables, it does not necessarily imply causation. The determination of the
direction of causation should come from outside of statistics; for example, from
economic theory. In other words, statistical relationships by themselves can not
logically imply causation. To ascribe causality, one must appeal to ‘a priori’ or
theoretical considerations.
conditional expected values, say E ⎛⎜ C ⎞⎟ , since they are obtained on the basis
⎝ Yi ⎠
of the fixed values of the conditioning variable (Y). It is important to distinguish
these conditional expected values from the unconditional expected values, E (C ) ;
which are calculated simply by summing all the values of C and dividing the
result by total number of families. The latter means are so called because in
arriving at them we have disregarded the incomes (Y) of the families. In general,
the conditional and unconditional mean values are different.
E ⎛⎜ Y ⎞⎟ = f ( X i ) .................................................. (2 .1)
⎝ Xi ⎠
Where f ( X i ) denotes some function of the explanatory variable X. Equation 2.1
14
expected value of the distribution of Y ( given Xi ) is functionally related to Xi. In
other words, it tells how the average values of Y vary with the values of X.
An important question that should be addressed at this juncture is about the form
of the function f ( X i ) , as in real situations we may not have the entire
PRF is not beyond empirical question, although in specific cases theory may
have something to say about it. For example, if we assume (perhaps from theory)
that Y and X are linearly related, as a first approximation, the PRF E ⎛⎜ Y ⎞⎟ may
⎝ Xi ⎠
be represented as a linear function of Xi as given below:
EY ( X) = β 0 + β1 X i ................................... (2 . 2)
Where β 0 and β1 are unknown but fixed parameters and known as the
regression coefficients.
Therefore, the stochastic specification of PRF is given as:
E ⎛⎜ Y ⎞⎟ = Yi − U i ......................................................(2.3)
⎝ Xi ⎠
⇒ Yi = E ⎛⎜ ⎞ + U ……………………………(2.3a)
Yi
⎝ X i ⎟⎠ i
⇒ Yi = β 0 + β1 X i + U i .................................(2 . 3b )
Thus, in regression analysis we are interested in estimating the PRF, that is,
estimating the values of the unknowns β1 and β 2 on the basis of observations on
Y and X. However, the challenge is to obtain data on all possible values of Y and
X, as in most piratical situations what we have would be sample values of Y
associated with fixed X’s. Therefore, the usual practice is to estimate the PRF on
the basis of the sample information. Nonetheless, the difficulty is that for a fixed
value of X, we can have different samples on the values of Y. For example, from
15
the population of Y values for fixed values of X, we can have the following two
samples which are only two of the many possible samples.
Sample: - 1 Sample:-2
Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 240 80 240
Now the question is how to estimate PRF from the observed sample data. This is
because the PRF can be estimated on the basis of sample information, though
not accurately since sampling always involves sampling fluctuation.
Where Ŷi is read as Y – hat and is an estimator of E ⎛⎜ Y ⎞⎟ ; βˆ0 and βˆ1 are
⎝ Xi ⎠
sample estimators of β 0 and β1 respectively. Equation 2.4 is called sample
Thus,
16
Yi = βˆ0 + βˆ1 X i + Uˆ i ……………………………….. 2.5
Y Û i
Ui Ŷi
PRF E ⎛⎜ Y ⎞⎟ = βˆ0 + βˆ1 X i
⎝ Xi ⎠
E ⎛⎜ Y ⎞⎟
⎝ Xi ⎠
Xi X
Note that SRF given in figure 2.2 is only one of the several possible SRFs. So
how can we choose the one that best approximates PRF? In other woods, how
can we obtain best estimators of the parameters β 0 and β1 based on sample
information?
17
2.3 The Ordinary Least Squares (OLS) Method
The OLS method is the most extensively used method of estimation in regression
analysis. Under certain assumptions, the least squares method has some
attractive statistical properties.
To illustrate the ordinary least squares (OLS) method, think of the theory of
supply in economics. In its simplest form, the theory postulates that there is a
positive relationship between quantity supplied of a commodity (Y) and its price
(X), other things remaining constant.
From Equation 2.3a, we know that the PRF of this relationship is given as:
Yi = E ⎛⎜ Y ⎞⎟ + U i
⎝ Xi ⎠
Assuming linearity, this can be rewritten as
⇒ Yi = β 0 + β1 X i + U i . ………………………… (2.6)
Where U i is a stochastic term and is responsible for different factors that affect
the dependent variable (Yi ) , but can not explicitly be taken into account by an
investigator.
Dear students, “Why do you think an investigator is not in a position to take into
account all the factors that affect the dependant variable? ------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
18
Some of the reasons for not taking all the factors that affect the dependent
variable into account are discussed as follows:
i) Omissions of variables from the function
In real world, economic variables may be influenced by a very large number of
other variables. However, the researcher may not include all of them explicitly in
his/her model; which may be attributed to the following reasons:
Thus, in most cases only a few most important variables would explicitly be
included in the model; where the effect of others on the dependent variable is
taken in to account by U i .
19
Furthermore s/he may use linearity to represent the relationship between the
dependent and explanatory variables, though the relationship should have been
studied by using non–linear models. In either of these cases the researcher ends
up with miss specified model and this is one of the reasons why U i is introduced
in econometric models.
Therefore, in order to take into account the above sources of errors, we introduce
a random variable in econometric models, which is usually dented by U and is
called error term or random disturbance term or stochastic term. U is so
called because it is supposed to disturb the exact linear relationship supposed to
exist between Y and X.
Yi = β 0 + β1 X i + U i ……………………………………………….2.7
Where U i represents all other variables than the price of the commodity that
20
2.4 Assumptions Underlying the Least Squares (OLS) method
The major objectives of regression analysis include estimation of and inferences
about the population parameters β 0 and β1 based on sample observations. For
example, we would like to know how close the estimates, say, βˆ0 and βˆ1 are to
close Yˆi is to the true E ⎛⎜ Y ⎞⎟ . Hence, beyond specifying the functional form of
⎝ Xi ⎠
the model, we have to make certain assumptions about the manner in which
Yi ' s are generated.
Therefore, unless we are specific about how X i and U i are generated, there is
no way we can make any statistical inference about Yi and about the estimates
βˆ0 and βˆ1 . Therefore, the assumptions made about X i -variable(s) and the
error term are very critical to make valid interpretation of the regression
estimates.
The Gaussian or classical linear regression model is based on the following ten
assumptions.
Assumption 1. Linear regression model
The regression model is linear in parameters, as show below
Yi = β 0 + β1 X i + U i
However, this assumption does not exclude models that are non–linear in
variables such as Yi = β 0 + β1 X i + β 2 X i + β 3 X i + ... + U i
2
21
Assumption 2: U i is a random real variable with zero mean value.
This means that the values that U i may assume in any particular period or
zero. Symbolically,
E ⎛⎜ ⎞ = 0 …………….………….2.8
Ui
⎝ X i ⎟⎠
In a nutshell, this assumption implies that the factors not explicitly included in the
model and therefore subsumed in U i , do not systematically affect the mean
value of Y; i.e., the positive U i values cancel out the negative U i values so that
E ⎛⎜ Y ⎞ = β + β X .
⎟ i …………………………………………....2.9
⎝ Xi⎠
0 1
It is customary that there is gap between individual values of Yi and the average
value of Yi associated with the fixed value of X (see figure 2.1). This gap is
22
Assumption 4: Homoscedasticity of U i
The variance of U i about its mean is constant at all values of X. In other words,
for all values of X, the U i values will show the same dispersion around their
Symbolically,
2
⎡ U ⎤
Var ⎛⎜ i ⎞⎟ = E ⎢ ⎛⎜ i ⎞⎟ − E ⎛⎜ i ⎞⎟⎥
U U
⎝ X i⎠ ⎣⎝ X i⎠ ⎝ X i ⎠⎦
⎡U 2 ⎤ ⎛U i ⎞
= E⎢ i ⎥ sin ce E ⎜⎝ X i ⎟⎠ = 0
⎣ X i ⎦
⇒ Var ⎛⎜ i ⎞ = σ 2 , ……………………………….2.10
U
⎟
⎝ X i ⎠
This means that the values of U i associated with one value of X are independent
of its values associated with other values of X. That means the covariance of any
U i with other U j is equal to zero. In other words, the value that the disturbance
term U assumes in any one period does not depend on its value in other periods.
Shortly, given any two X values, X i and X j (where i ≠ j ), the correlation
23
Symbolically,
[ ]
Cov (U i , U j / X i , X j ) = E{[U i − E (U i ) ] / X i } { U j − E (U j ) / X j }
⎛U j ⎞ ⎛U ⎞
= E ⎛⎜ i ⎞⎟ ⎛U ⎞
U
⎜ X ⎟ ....sin ceE ⎜ i X ⎟ = E ⎜ j X ⎟ = 0
⎝ Xi ⎠ ⎝ j⎠ ⎝ i⎠ ⎝ j⎠
This means that the disturbance U i and the explanatory variable X are
uncorrelated. The values of U and X do not tend to vary together; i.e., their
covariance is zero.
Symbolically,
Cov (U i , X i ) = E [U i − E (U i ) ] [ X i − E ( X i ) ]
= E [U i [X i − E ( X i ) ] − Since E (U i ) = 0]
= E [U i X i −U i E ( X i ) ] = E [U i X i ] − E [U i E ( X i ) ]
24
Since E ( X i ) is non stochastic, E (U i E ( X i ) ) = E [U i ] E ( X i ) ,
Thus,
Cov (U i , X i ) = E (U i X i ) − E (U i ) E ( X i )
⇒ Cov (U i , X i ) = 0 …………………………………….…………2.12
25
Activity
Dear readers, what do you think is the difference between assumptions 2
and 9?--------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
2. Variance
Var (Y i ) = Var (U i ) = σ 2
u .
Proof:
26
1. By definition, the expected value of Y is equal to its mean value. Therefore, the
mean of Y is given as
E (Y i ) = E (β 0 + β 1 X i . + U i ) , since Yi = β 0 + β1 X i . + U i
= E ( β 0 + β 1 X i ) . + E (U i )
We know that β 0 and β 1 are parameters and hence, they are constant.
Assumption 2 E ⎛⎜ ⎞ = 0 .
Ui
⎝ X i ⎟⎠
Therefore,
E (Y i ) = E ( β 0 + β 1 X i ) . + 0
= β 0 + β1 X i ,
Var (Y i ) = E ( Y i − E ( Y i )) 2
Substituting
Yi = β 0 + β1 X i . + U i and E (Yi ) = β 0 + β 1 X i
Var (Yi ) = E ( β 0 + β1 X i . + U i − β 0 − β1 X i ) 2
= E [U i ]
2
= σ u2
27
Therefore, we can conclude that the variance of Y is the same as the variance of
the stochastic term.
This relationship holds for the population values of Y and X, so that we could
obtain numerical values of β 0 and β1 only if we could have all the conceivably
possible values of Y, X and U, which form the population values of the variables.
28
In Equation 2.5 we have noted that SRF is given as:
Yi = βˆ0 + βˆ1 + Uˆ i
Therefore,
Uˆ i = Yi − βˆ0 − βˆ1 = Yi − Yˆi ……………………2.13
The question now is as to how to determine SRF. This means we are mainly
interested in determining the SRF in such a way that the line is as close as
possible to the actual Y. It is intuitively obvious that the smaller the deviations
from the line, the better the fit of the line to the actual observations on Y, i.e., we
have to choose the SRF in such a manner that the sum of the residuals
∑Uˆ i = ∑ (Y
i )
− Yˆi is as small as possible. This approach is not an appropriate
approach, no matter how intuitively appealing it may be. The reason for this is
that the minimization of ∑Û i gives equal weight to different deviations; no
matter how large or small the deviations may be; i.e., it attaches equal
importance to all U i ' s no matter how close or how widely scattered the individual
observation are from the SRF. Consequently, the algebraic sum of the Û i is
small (even zero) although individual U i are widely scattered about the SRF.
This means that the minimization of the ∑Û i doesn’t necessarily imply that
( )
individual deviations Uˆ i ' s are minimized.
To ease this problem, we adopt the least squares criterion. This criterion requires
the regression line to be drawn (its parameters to be chosen) in such a way as to
minimize the sum of the squares of the deviations of the observations from it; i.e.,
it should minimize ∑Û i by squaring Û i . Hence, this approach gives more weight
to residuals with wider dispersion than those with closer dispersion around the
line.
29
∑Uˆ ∑ (Y ) ∑ (Y )
2 2
Thus, i
2
= i − Yˆi = i − βˆ0 − βˆ1 ……………………..2.14
∑Uˆ i
2
( )
= f βˆ0 , βˆ1 ………………………………………..2.15
This implies that for any given set of data, choosing different values for
βˆ0 and βˆ1 will give different Uˆ ' s and hence different values of ∑Uˆ i
2
. This
implies that by assigning different values for βˆ0 and βˆ1 we will have different
regression lines (SRFs) for the same sample.
For example, if βˆ0 = 1.5 and βˆ1 = 1.3 , then SRF can be given as
If, on the other hand, βˆ0 = 3 and βˆ1 = 1, then SRF can be given as
SRF2 : Yˆi = 3 + X i
Dear readers, which of these two lines do you think will give the best fit to the
observed data? Alternatively, which set of βˆ should be chosen? ---------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
----------------------
According to the least squares criterion the one that produces minimum values to
∑Uˆ i
2
must be chosen. This means, that to choose the best set of βˆ ' s , we will
assign many more values to βˆ ' s and see what may happen to ∑U i
2
. However,
30
in practice we may not have sufficient time and patience to conduct these trial
and error processes. Therefore, we need to look for some short cuts. Fortunately,
the method of least squares provides us such a short cut. The principle of least
squares chooses βˆ0 and βˆ1 in such a way that, for a given sample, ∑Uˆ i
2
is as
small as possible.
∑ (Y )
n n
∑Uˆ i2 =
2
i − βˆ0 − βˆ1 …………………. ……….2.14*
i = 1 i = 1
∂ ∑Uˆ i2
ii ) = 0 …………………….2.17
∂ βˆ 1
1)
∂ ∑U i2
=
(
∂ ∑ Yi − βˆ0 − βˆ1 X i ) 2
=0
∂ βˆ0 ∂βˆ0
( )
= 2 ∑ Yi − βˆ0 − βˆ1 X i . (− 1) = 0
(
= − 2 ∑ Yi − βˆ0 − βˆ1 X i = 0 )
( )
= ∑ Yi − βˆ0 − βˆ1 X i = 0
31
= ∑ Yi − n βˆ0 − βˆ1 ∑X i = 0 …………….2.18
⇒
(
∂ ∑ Yi − βˆ0 − βˆ1 X i )
2
= 0
∂βˆ 1
(
⇒ 2 ∑ Yi − βˆ0 − βˆ1 X i ) (− X ) = 0
i
⇒ −2 ∑ (Y X
i i − βˆ0 X i − β1 X i2 = 0 )
⇒ ∑Y i X i − βˆ0 ∑X i − βˆ1 ∑X i
2
= 0
⇒ ∑Y i X i = βˆ0 ∑X i + βˆ1 ∑X i
2
................................. ………….2.20
Note that equations 2.19 and 2.20 are called normal equations of OLS.
Then, to develop formula to compute numerical values for βˆ0 and βˆ1 , we solve
⎡ ∑ Yi ⎤
Let A = ⎢ ⎥
⎣∑ Yi X i ⎦
32
⎡ n
B= ⎢
∑X i ⎤
⎥
⎣∑ X i ∑X 2
i ⎦
⎡ ∑ Yi ∑X ⎤
C = ⎢ i
⎥
⎣∑ Yi X i ∑X 2
i ⎦
⎡ n
D=⎢
∑ Y ⎤⎥ i
⎣∑ X i ∑Y X ⎦ i i
Then,
det er min ant of C
βˆ0 =
det er min ante of B
=
(∑ Y ) .(∑ X ) − ∑ x ∑ Y x
i i
2
i i i
………………………………………2.21
n ∑ x − (∑ x ) 2 2
i i
And,
det er min ante of D
βˆ1 =
det er min ante of B
n ∑ Yi xi − ∑ x ∑Y
= i i
………………………….……………………2.22
n ∑x − (∑ x )
2 2
i i
The estimators βˆ0 and βˆ1 obtained from this process are called the least
squares estimators, since they are developed via the least squares principle.
In passing, note that equations 2.21 and 2.22 can be expressed in deviation
forms as:
And βˆ1 = ∑x y
i i
………………………………………………2.24
∑x 2
i
33
2.7 Estimation of a Function Whose Intercept is Zero
In some cases economic theory postulates relationships which have a zero
constant intercept. For example, linear production functions of manufactured
products should normally have zero intercept, since out put is zero when the
factor in puts are zero.
∑Uˆ ∑ (Y )
2
i
2
= i − βˆ0 − βˆ1 X i
subject to
βˆ0 = 0
(
L = ∑ Y − βˆ0 − βˆ1 X i )
2
− λ βˆ0 ………………………………………2.25
The values of βˆ0 andβˆ1 that maximize equation 2.25 can be obtained by taking
the partial differential of equation 2.25, which is given as follows;
1)
∂L = 2 ∑ (Y − βˆ 0 )
− βˆ1 X (− 1) − λ = 0
∂βˆ
0
=− 2 ∑ (Y − βˆ 0 )
− βˆ1 X − λ = 0 ………………………….……2.26
34
∑ (Y − βˆ ) (− X ) = 0
∂L
2) = 2 − βˆ1 X
∂βˆ
0
1
( )
= − 2 ∑ Y − βˆ0 − βˆ1 X ( X ) = 0 ……………………………..….2.27
∂L = − βˆ0 = 0
3)
∂λ
∧
⇒ β 0 = 0 ………………………………………………….…………..2.28
(
⇒ 2 ∑ Y − βˆ1 X ) (x ) = 0
⇒ ∑Y X − βˆ1 ∑X 2
= 0
⇒ ∑ Y X = βˆ1 ∑X 2
∧
⇒ β1 =
∑ YX ……………………………………………………2.28*
∑X 2
Note that the difference between equations 2.24 and 2.28* is that the former is in
deviation form while the latter involves actual values.
35
2.8.1 The Leg – linear model
To illustrate this model, assume that you are given the so called exponential
regression model given as:
ln Yi = ln β 0 + β1 la X i U i ………………………………2.30
i) ln( AB) = ln A + ln B
ii) ln A = ln A − ln B
B
( )
iii) ln AK = k ln A
ln Yi = α + β1 ln X i + ui ………………………….…………….2.31
From equation 2.31, it is clear that the model is linear in parameters α and β1 and
linear in the logarithms of the variables Y and X. Hence, it can be estimated by
OLS regression, which is suitable only for linear models. It is this linearity that
makes the model to be called as log-log or double log or log-linear model.
36
This model is very popular in applied work. This is mainly due the fact that the
slope coefficient (e.g. β1 in equation 2.31) measures the elasticity of Y with
respect to X. i.e., the percentage change in Y for a give (small) percentage
change in X.
Note: The log-linear model assumes that the elasticity coefficient between Y and
X, remains constant throughout. In other words it shows that the elasticity
remains the same no matter at which value of X we measure the elasticity. Due
to this reasons the model is also called constant elasticity model.
Activity
Let Y represent the quantity supplied of a commodity and X represent the
price of the commodity. Then, if you use equation 2.29 to model the
relationship between Y and X, interpret the coefficient of X?-----------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
To see how these models develop, let Yt denote real expenditure on services at
time t and Y0 the initial value of the expenditure on service. As you may recall the
formula of compound growth rate is given as
37
Yt = Y0 (1 + r ) ………………………………………..…….…..2.32
t
ln Yt = β 0 + β1 t ………………………………………………….2.34
To make equation 2.34 stochastic and develop econometric model, we add the
disturbance term. Then it becomes,
ln Yt = β 0 + β1 t + U t ………………………………………..2.35
As it can be seen from the above equation, the model is linear in the
parameters β 0 and β1 . The only difference in this model is that the regressand is
ln Y and the regressor is time, t . Models like equation 2.35 are called Semi-Log
models since it is only one variable that appears in logarithmic form.
Semi- log models are called Log-Lin models if the regressand is in logarithmic
form. In Log-Lin models, the slope coefficient measures a constant proportional
or relative change in Y for a given absolute change in the values of the regressor.
That means in equation 2.35,
38
If we multiply the relative change in Y by 100, equation 2.36 will give the
percentage change or the growth rate in Y for an absolute change in the
explanatory variables i.e., 100 times β 2 gives the growth rate in Y and some
times it is called the semi-elasticity of Y with respect to the explanatory variable.
Yi = β 0 + β1 ln xi + U i ……………………………………………….2.37
These types of models are known as Lin-Log models. In this case, β 2 is given
as:
Change in Y
β2 =
Change in ln X
Change in Y
=
Re lative change in X .
ΔY
=
ΔX
X
Equivalently,
ΔY = β 2 ΔX( X
) ………………….…………………. ….2.38
This equation states that the absolute change in Y (i.e. ΔY ) is equal to slope times
relative change in X. If the terms in the right had side of equation 2.38 is
multiplied by 100, then the equation gives the absolute change in Y for a
39
Thus, it is noteworthy that when equation 2.38 is estimated by OLS, the value of
the estimated slope coefficient must be multiplied by 0.01; otherwise your
interpretation will be misleading.
Generally, while the choice of a particular functional form may depend on the
underlying theory, it is a good practice use a model that enables us to find out the
rate of change of the dependent variable with respect to the explanatory variable
as well as the elasticity of the regressand with respect to the explanatory
variables.
Exercises
1. Why do we need regression analysis?
2. What is the difference between regression and correlation analysis?
3. What is the difference between the population and sample regression
functions?
4. What is the role of the stochastic error term U i in regression analysis?
Model I: Yi = β 0 + β1 X i + U i
Model II: Yi = α 0 + α1 ( X i − X ) + U i
40
X = 520
∑X i
2
= 3100 ∑Y i
2
= 539,500
41