Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views31 pages

Chapter 2

Uploaded by

zekarias zassa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views31 pages

Chapter 2

Uploaded by

zekarias zassa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

CHAPTER 2

SIMPLE LINEAR REGRESSION MODEL


Overview
Dear learners in chapter 1, we have learnt the definition and scope of
econometrics, and how it differs from different branches of science. Furthermore,
we have discussed the stages involved in the methodologies of econometric
research.

In this chapter we will give emphasis to linear regression models and the
application of ordinary least squares (OLS) method to obtain the estimates of the
parameters of the true economic relationships. This means, in this chapter we
will learn how to develop formula for the estimates of the parameters by using the
method of OLS. Finally, the chapter will emphasize the different ways of
developing econometric models based on economic theories.

At the end of this chapter students will be able to:


ƒ Understand regression analysis and how to differentiate it from correlation
analysis.
ƒ Know population regression functions and sample regression functions
ƒ Know the Ordinary Least square (OLS) method of estimation and apply it
to estimate the parameters of economic functions
ƒ Differentiate among different functional forms of econometric models such
as log–linear models and semi-log models, and use them in
appropriate situations.
2.1 The Meaning of Regression Analysis
Regression analysis is concerned with the study of the dependence of one
variable (the dependant variable) on one or more other variables (the explanatory
variable(s)) (Gujrati, 2004). The objective of regression analysis could be to
estimate and/or predict the (population) mean value of the dependant variable in
terms of the known or fixed (in repeated sampling) values of the explanatory
variables.

To illustrate the concept of regression analysis, suppose that a researcher


collected data on monthly income (Y) and consumption expenditure (C) of 40
families from a hypothetical community. As shown in Table 2.1 the data
collected from these 40 families are divided into seven income groups and the
monthly expenditures of each family in the seven groups are as shown in the
table.

Table 2.1: Monthly Income and Consumption Expenditure of 40 Families

Monthly 800 1000 1200 1400 1800 2300 3400


Income (Y)

Monthly 500 850 790 900 1020 1100 1500
Consumption
550 700 800 930 1050 1200 1750
Expenditure
↓ 650 800 940 1030 1400 1250 2500
700 650 980 1000 1500 1800 2600
- 880 1000 1100 1550 2000 3000
- 900 1100 1300 1700 2010 3100

It is evident from the above table that there are seven fixed values of Y and the
corresponding C values against each of the fixed Y values. As shown in table, for
fixed values of monthly income there can be different values of monthly
consumption expenditure. Succinctly, having the same monthly income it’s
possible for families to have different consumption expenditures. Hence, when

12
we are taking repeated samples, it is possible that we can generate different
samples of the same size with the same data for monthly income but different
values for consumption expenditure; this is the essence of fixed values for the
explanatory variables in repeated sampling*.

Here, it is worthy to note that these average values of C are conditional on the

fixed values of the Y i.e., E ⎛⎜ C ⎞⎟ . This implies that regression analysis is


⎝ Yi ⎠
studying the dependence of the monthly expenditure of the families on their
monthly income and is used to predict the average monthly expenditures
associated with different levels of monthly income.

In a nutshell, regression analysis deals with statistical dependence among


variables; but not with functional or deterministic dependence among variables.
In statistical relationships we essentially deal with random (stochastic) variables;
i.e., variables that have probability distributions.

Stochastic relationship is a relationship wherein for a particular value of the


independent variable, there is a probability distribution of the values of the
dependent variable. In such a case for any given value of the independent
variable (Y in the above example), the dependent variable (C) assumes some
specific value only with some probability. In contrary, deterministic relationship is
a relationship wherein, for each value of the independent variable there is one
and only one corresponding value of the dependent variable.

In econometrics we exclusively deal with stochastic relation ships. The model


that describes the relation ship between only two variables is called simple linear
regression model. The term linear regression implies that the regression is linear
in parameters; but it may or may not be linear in explanatory variables. Although

*
Repeated sampling is only hypothetical; in practice we take only one sample and base our regression on
this observed sample.

13
regression analysis deals with the dependence of one variable on other
variables, it does not necessarily imply causation. The determination of the
direction of causation should come from outside of statistics; for example, from
economic theory. In other words, statistical relationships by themselves can not
logically imply causation. To ascribe causality, one must appeal to ‘a priori’ or
theoretical considerations.

In addition, regression analysis is closely related to correlation analysis but


conceptually there is huge difference between the two analyses. The primary
objective of correlation analysis is to measure the strength or degree of linear
association between two variables. However, in regression analysis (as already
noted), we try to predict the average value of the dependent variable on the basis
of fixed values of the explanatory variables. We call these average (mean) values

conditional expected values, say E ⎛⎜ C ⎞⎟ , since they are obtained on the basis
⎝ Yi ⎠
of the fixed values of the conditioning variable (Y). It is important to distinguish
these conditional expected values from the unconditional expected values, E (C ) ;
which are calculated simply by summing all the values of C and dividing the
result by total number of families. The latter means are so called because in
arriving at them we have disregarded the incomes (Y) of the families. In general,
the conditional and unconditional mean values are different.

2.2 Population and Sample Regression Functions


As noted above, each conditional mean value of any dependent variable (say Y)
is a function of an explanatory variable (say X), where Xi is a given value of X.
Symbolically,

E ⎛⎜ Y ⎞⎟ = f ( X i ) .................................................. (2 .1)
⎝ Xi ⎠
Where f ( X i ) denotes some function of the explanatory variable X. Equation 2.1

is known as the conditional expectation function (CEF) or population regression


function (PRF) or population regression (PR), which merely implies that the

14
expected value of the distribution of Y ( given Xi ) is functionally related to Xi. In
other words, it tells how the average values of Y vary with the values of X.

An important question that should be addressed at this juncture is about the form
of the function f ( X i ) , as in real situations we may not have the entire

population available for examination of f ( X i ) . However, the functional form of

PRF is not beyond empirical question, although in specific cases theory may
have something to say about it. For example, if we assume (perhaps from theory)

that Y and X are linearly related, as a first approximation, the PRF E ⎛⎜ Y ⎞⎟ may
⎝ Xi ⎠
be represented as a linear function of Xi as given below:

EY ( X) = β 0 + β1 X i ................................... (2 . 2)

Where β 0 and β1 are unknown but fixed parameters and known as the

regression coefficients.
Therefore, the stochastic specification of PRF is given as:

E ⎛⎜ Y ⎞⎟ = Yi − U i ......................................................(2.3)
⎝ Xi ⎠

⇒ Yi = E ⎛⎜ ⎞ + U ……………………………(2.3a)
Yi
⎝ X i ⎟⎠ i

⇒ Yi = β 0 + β1 X i + U i .................................(2 . 3b )

Thus, in regression analysis we are interested in estimating the PRF, that is,
estimating the values of the unknowns β1 and β 2 on the basis of observations on
Y and X. However, the challenge is to obtain data on all possible values of Y and
X, as in most piratical situations what we have would be sample values of Y
associated with fixed X’s. Therefore, the usual practice is to estimate the PRF on
the basis of the sample information. Nonetheless, the difficulty is that for a fixed
value of X, we can have different samples on the values of Y. For example, from

15
the population of Y values for fixed values of X, we can have the following two
samples which are only two of the many possible samples.

Sample: - 1 Sample:-2

Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 240 80 240

Now the question is how to estimate PRF from the observed sample data. This is
because the PRF can be estimated on the basis of sample information, though
not accurately since sampling always involves sampling fluctuation.

The regression functions based on sample information are called sample


regression functions (SRF); for instance, from the above two samples we can
have two regression functions to represent the sample regression line.
( SRF1 and SRF2 )

In equation 2.2 we have noted that E ⎛⎜ Y ⎞⎟ = β 0 + β1 X i . Hence, the sample


⎝ Xi ⎠
estimator of this relationship would be given as
Yˆi = βˆ0 + βˆ1 X i ………………………………….. 2.4

Where Ŷi is read as Y – hat and is an estimator of E ⎛⎜ Y ⎞⎟ ; βˆ0 and βˆ1 are
⎝ Xi ⎠
sample estimators of β 0 and β1 respectively. Equation 2.4 is called sample

regression function and its stochastic form is given as:


Yi = Yˆi + Uˆ i

Thus,

16
Yi = βˆ0 + βˆ1 X i + Uˆ i ……………………………….. 2.5

Where Û i is the estimate of U i and it defines the sample residual term.

The primary objective of regression analysis is to estimate PRF given as


Yi = β1 + β 2 X i + U i on the basis of SRF: Yi = βˆ1 + βˆ2 X i + Uˆ i .

Graphically, equation 2.5 is given as Yi SRF : Yˆi = βˆ0 + βˆ1 X i

Y Û i

Ui Ŷi
PRF E ⎛⎜ Y ⎞⎟ = βˆ0 + βˆ1 X i
⎝ Xi ⎠

E ⎛⎜ Y ⎞⎟
⎝ Xi ⎠

Xi X

Figure 2.1 Sample and Population Regression Lines.

Note that SRF given in figure 2.2 is only one of the several possible SRFs. So
how can we choose the one that best approximates PRF? In other woods, how
can we obtain best estimators of the parameters β 0 and β1 based on sample

information?

To address this question econometricians developed different techniques; one of


which is the Ordinary Least Squares (OLS) Method.

17
2.3 The Ordinary Least Squares (OLS) Method
The OLS method is the most extensively used method of estimation in regression
analysis. Under certain assumptions, the least squares method has some
attractive statistical properties.

To illustrate the ordinary least squares (OLS) method, think of the theory of
supply in economics. In its simplest form, the theory postulates that there is a
positive relationship between quantity supplied of a commodity (Y) and its price
(X), other things remaining constant.

From Equation 2.3a, we know that the PRF of this relationship is given as:

Yi = E ⎛⎜ Y ⎞⎟ + U i
⎝ Xi ⎠
Assuming linearity, this can be rewritten as
⇒ Yi = β 0 + β1 X i + U i . ………………………… (2.6)

Where U i is a stochastic term and is responsible for different factors that affect

the dependent variable (Yi ) , but can not explicitly be taken into account by an

investigator.

Dear students, “Why do you think an investigator is not in a position to take into
account all the factors that affect the dependant variable? ------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------

18
Some of the reasons for not taking all the factors that affect the dependent
variable into account are discussed as follows:
i) Omissions of variables from the function
In real world, economic variables may be influenced by a very large number of
other variables. However, the researcher may not include all of them explicitly in
his/her model; which may be attributed to the following reasons:

a) Some of the variables may be unknown to the researcher him/herself


b) Even if all variables are known to the investigator, the available data
most often are not adequate to measure all variables that influence the
dependent variable.
c) Some of the variables though they are known to be relevant, may not
be measured statistically (e.g. tastes, religions, gender etc)
d) Some variables may have, each individually, insignificant influence on
the dependent variable
e) Randomness of some variables such as epidemics, earthquakes, war
etc, which may make them unpredictable.

Thus, in most cases only a few most important variables would explicitly be
included in the model; where the effect of others on the dependent variable is
taken in to account by U i .

ii) Intrinsic Randomness in human behavior


Even if the researcher succeeds in including all the relevant variables into the
model, there would be some “intrinsic” randomness in the dependent variable
that can not be explained no matter how hard the researcher tries, which may be
due to the erratic behavior of human beings.

iii) Misspecification of the model


Albeit the economic phenomena are much more complex than a single equation
may reveal, some times researchers may use single equation models.

19
Furthermore s/he may use linearity to represent the relationship between the
dependent and explanatory variables, though the relationship should have been
studied by using non–linear models. In either of these cases the researcher ends
up with miss specified model and this is one of the reasons why U i is introduced

in econometric models.

iv) Aggregation errors


We often use aggregate data, in which we add magnitudes referring to
individuals whose behaviors are dissimilar. Hence, in the process of aggregation
attributes expressing individual peculiarities would be lost.

Therefore, in order to take into account the above sources of errors, we introduce
a random variable in econometric models, which is usually dented by U and is
called error term or random disturbance term or stochastic term. U is so
called because it is supposed to disturb the exact linear relationship supposed to
exist between Y and X.

Having studied the relevance of U i to economic relationships, the economic

theory of supply in its simplest form can be modeled as:

Yi = β 0 + β1 X i + U i ……………………………………………….2.7

Where U i represents all other variables than the price of the commodity that

affect the quantity supplied (Y). However, the relationship represented in


Equation 2.7 is not directly observable and hence, we have to estimate it on the
basis of sample information. To estimate β 0 and β1 we have to collect data on

Y, X and U. Nonetheless, we can not get data on U as it is stochastic and can


never be observed. Therefore, in order to estimate the parameters and make we
should guess the values of U i i i.e., make some plausible assumptions about the

shape and distribution of U.

20
2.4 Assumptions Underlying the Least Squares (OLS) method
The major objectives of regression analysis include estimation of and inferences
about the population parameters β 0 and β1 based on sample observations. For

example, we would like to know how close the estimates, say, βˆ0 and βˆ1 are to

the parameters β 0 and β1 , respectively. In other words, we want to know how

close Yˆi is to the true E ⎛⎜ Y ⎞⎟ . Hence, beyond specifying the functional form of
⎝ Xi ⎠
the model, we have to make certain assumptions about the manner in which
Yi ' s are generated.

Form equation 2.7, we have noted that Yi depends on both X i and U i .

Therefore, unless we are specific about how X i and U i are generated, there is

no way we can make any statistical inference about Yi and about the estimates

βˆ0 and βˆ1 . Therefore, the assumptions made about X i -variable(s) and the
error term are very critical to make valid interpretation of the regression
estimates.

The Gaussian or classical linear regression model is based on the following ten
assumptions.
Assumption 1. Linear regression model
The regression model is linear in parameters, as show below

Yi = β 0 + β1 X i + U i

However, this assumption does not exclude models that are non–linear in
variables such as Yi = β 0 + β1 X i + β 2 X i + β 3 X i + ... + U i
2

21
Assumption 2: U i is a random real variable with zero mean value.

This means that the values that U i may assume in any particular period or

instance depend on chance. It may assume positive, negative or zero values.


Furthermore, given the value of X, the mean value of the random disturbance
term U i is zero. Technically, this means that the conditional mean value of U i is

zero. Symbolically,

E ⎛⎜ ⎞ = 0 …………….………….2.8
Ui
⎝ X i ⎟⎠

In a nutshell, this assumption implies that the factors not explicitly included in the
model and therefore subsumed in U i , do not systematically affect the mean

value of Y; i.e., the positive U i values cancel out the negative U i values so that

their average effect on Y is zero. Given Yi = β 0 + β1 X i + U i , this assumption

leads to the fact that:

E ⎛⎜ Y ⎞ = β + β X .
⎟ i …………………………………………....2.9
⎝ Xi⎠
0 1

Assumption 3: The disturbance term U i has a normal distribution

This assumption is an extension of assumption 3. It suggests that the values of


U i (for each X i ) have a normal distribution, which is bell–shaped and

symmetrical about the zero mean of U i

It is customary that there is gap between individual values of Yi and the average

value of Yi associated with the fixed value of X (see figure 2.1). This gap is

represented by U i , which could be positive or negative. Furthermore, the values

of U i associated with a given value of X is symmetrically distributed around its

mean value zero and it has a normal distribution.

22
Assumption 4: Homoscedasticity of U i

The variance of U i about its mean is constant at all values of X. In other words,

for all values of X, the U i values will show the same dispersion around their

mean. Furthermore, given the value of X, the variance if U i is the same

(constant) for all observations.

Symbolically,
2
⎡ U ⎤
Var ⎛⎜ i ⎞⎟ = E ⎢ ⎛⎜ i ⎞⎟ − E ⎛⎜ i ⎞⎟⎥
U U
⎝ X i⎠ ⎣⎝ X i⎠ ⎝ X i ⎠⎦

⎡U 2 ⎤ ⎛U i ⎞
= E⎢ i ⎥ sin ce E ⎜⎝ X i ⎟⎠ = 0
⎣ X i ⎦

⇒ Var ⎛⎜ i ⎞ = σ 2 , ……………………………….2.10
U

⎝ X i ⎠

Note that the variance in equation 2.10 is constant..


Note: Assumption 4 implies that the values of Y corresponding to various values
of X have constant variance.

Assumptions 5. No Autocorrelation between the values of the disturbance


term, U i

This means that the values of U i associated with one value of X are independent

of its values associated with other values of X. That means the covariance of any
U i with other U j is equal to zero. In other words, the value that the disturbance

term U assumes in any one period does not depend on its value in other periods.
Shortly, given any two X values, X i and X j (where i ≠ j ), the correlation

between any two U i and U j ( i ≠ j ) is zero.

23
Symbolically,
[ ]
Cov (U i , U j / X i , X j ) = E{[U i − E (U i ) ] / X i } { U j − E (U j ) / X j }

⎛U j ⎞ ⎛U ⎞
= E ⎛⎜ i ⎞⎟ ⎛U ⎞
U
⎜ X ⎟ ....sin ceE ⎜ i X ⎟ = E ⎜ j X ⎟ = 0
⎝ Xi ⎠ ⎝ j⎠ ⎝ i⎠ ⎝ j⎠

⇒ Cov (U i , U j / X i , X j ) = 0 ....................... …………….…..………..2.11

Assumption 6. The values of X are fixed in repeated samples.


This means that in taking a large number of samples on Y and X, the X values
are the same on all samples but the values of Y do differ from sample to sample;
i.e., X is assumed to be non stochastic.

For example, as we discussed in section 2.1 above, when we collect data on


family income (Y) and consumption expenditure (C) from a certain community,
keeping the value of Y fixed, say, at level ET Birr 1000 we may perhaps draw at
random a family with monthly (or weekly) consumption expenditure (C) of, say
ET Birr 600. Still keeping X at ET Birr 1000, we may draw at random another
family with C value of ET Birr 800 and so on.

Assumption 7: U i is independent of the explanatory variable (X).

This means that the disturbance U i and the explanatory variable X are

uncorrelated. The values of U and X do not tend to vary together; i.e., their
covariance is zero.

Symbolically,
Cov (U i , X i ) = E [U i − E (U i ) ] [ X i − E ( X i ) ]

= E [U i [X i − E ( X i ) ] − Since E (U i ) = 0]

= E [U i X i −U i E ( X i ) ] = E [U i X i ] − E [U i E ( X i ) ]

24
Since E ( X i ) is non stochastic, E (U i E ( X i ) ) = E [U i ] E ( X i ) ,

Thus,
Cov (U i , X i ) = E (U i X i ) − E (U i ) E ( X i )

= E (ui xi ) , sin ce E (ui ) = 0

⇒ Cov (U i , X i ) = 0 …………………………………….…………2.12

Equation 2.12 is the implication of assumption 6.

Assumption 8: No perfect multicollinearity among explanatory variables


The explanatory variables are not perfectly correlated with each other. In other
words, there is no perfect linear relationship among the explanatory variables.
This assumption however, does not exclude non-linear relationships among the
explanatory variables.

Assumption 9: Variability in X values


The X values in a given sample must not all be the same. Technically,
Var ( X ) must be a finite positive number. This means that x assumes different
values in a given sample; but it assumes fixed values in a hypothetical repeated
samples.

Assumption 9 is very critical since without this assumption it would be impossible


to estimate the parameters and hence, regression analysis would fail. For
example, if there is little variation in family income, we will not be able to explain
much of the variation in the consumption expenditure of the families.

25
Activity
Dear readers, what do you think is the difference between assumptions 2
and 9?--------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------

Assumption 10: The regression model is correctly specified


This means that the mathematical form of the model is correctly specified and all
important explanatory variables are included in it. In other words, there is no
specification bias or error in the model used in empirical analysis. Unfortunately,
in practice one rarely specifies the correct model. Hence, an econometrician
would use some judgment in choosing the correct model, i.e., in determining the
number of variables entering the model, assumptions about the distribution of the
variables and functional form of the model s/he has to utilize some a priori or
theoretical grounds.

2.5 The Distribution of the Dependent Variable, Y


So far we have determined the distribution of the explanatory variables and the
stochastic term. In this section, we will determine the distribution of the
dependent variable. Based on the assumptions we discussed so far about the
distributions of X and U , we can establish that Y is normally distributed with:
1. Mean
E (Y i ) = β 0 + β 1 X i . and

2. Variance
Var (Y i ) = Var (U i ) = σ 2
u .

Proof:

26
1. By definition, the expected value of Y is equal to its mean value. Therefore, the
mean of Y is given as

E (Y i ) = E (β 0 + β 1 X i . + U i ) , since Yi = β 0 + β1 X i . + U i

= E ( β 0 + β 1 X i ) . + E (U i )

We know that β 0 and β 1 are parameters and hence, they are constant.

Furthermore, by Assumption 6, the values of X are a set of fixed numbers and by

Assumption 2 E ⎛⎜ ⎞ = 0 .
Ui
⎝ X i ⎟⎠

Therefore,
E (Y i ) = E ( β 0 + β 1 X i ) . + 0

= β 0 + β1 X i ,

Since Assumption 6 implies that E ( β 0 + β 1 X i ) = β 0 + β1 X i

2. The variance of Y is given as

Var (Y i ) = E ( Y i − E ( Y i )) 2

Substituting

Yi = β 0 + β1 X i . + U i and E (Yi ) = β 0 + β 1 X i

Var (Yi ) = E ( β 0 + β1 X i . + U i − β 0 − β1 X i ) 2

= E [U i ]
2

= σ u2

Since by Assumption 4, the homoscedastic variance of U is given as


Var (U i ) = E (U i ) 2

27
Therefore, we can conclude that the variance of Y is the same as the variance of
the stochastic term.

3. The shape of the distribution of Y is normal

The distribution of Y is merely determined by the distribution of U . This is due to


the fact that β 0 and β1 are constants and hence, they do not affect the distribution

of the dependent variable. Furthermore, by Assumption 6, the values of X are a


set of fixed numbers and therefore, do not affect the distribution of Y . Thus, the
distribution of Y is normal following the normality of the distribution of U .

2.6 The Least Squares Criterion and the OLS Estimates


Now assume that we have completed the work involved in the first four stages of
the econometric methodology discussed in chapter one; namely we have
specified the econometric model, stated its assumptions and collected the
required data. Then the next step is the estimation of the model.

Recall the two variables PRF given in Equation 2.6.


Yi = β 0 + β1 X i + U i

This relationship holds for the population values of Y and X, so that we could
obtain numerical values of β 0 and β1 only if we could have all the conceivably

possible values of Y, X and U, which form the population values of the variables.

Nonetheless, this is impossible in practice. Therefore, we have to obtain a


sample of observed values of Y and X, specify the distribution of the U and try to
get satisfactory estimates of the true parameters of the relationship. This is done
by fitting a regression line (SRF) through the observations of the sample, which
would be considered as an approximation to the true line.

28
In Equation 2.5 we have noted that SRF is given as:
Yi = βˆ0 + βˆ1 + Uˆ i

Therefore,
Uˆ i = Yi − βˆ0 − βˆ1 = Yi − Yˆi ……………………2.13

The question now is as to how to determine SRF. This means we are mainly
interested in determining the SRF in such a way that the line is as close as
possible to the actual Y. It is intuitively obvious that the smaller the deviations
from the line, the better the fit of the line to the actual observations on Y, i.e., we
have to choose the SRF in such a manner that the sum of the residuals

∑Uˆ i = ∑ (Y
i )
− Yˆi is as small as possible. This approach is not an appropriate

approach, no matter how intuitively appealing it may be. The reason for this is
that the minimization of ∑Û i gives equal weight to different deviations; no

matter how large or small the deviations may be; i.e., it attaches equal
importance to all U i ' s no matter how close or how widely scattered the individual

observation are from the SRF. Consequently, the algebraic sum of the Û i is

small (even zero) although individual U i are widely scattered about the SRF.

This means that the minimization of the ∑Û i doesn’t necessarily imply that

( )
individual deviations Uˆ i ' s are minimized.

To ease this problem, we adopt the least squares criterion. This criterion requires
the regression line to be drawn (its parameters to be chosen) in such a way as to
minimize the sum of the squares of the deviations of the observations from it; i.e.,
it should minimize ∑Û i by squaring Û i . Hence, this approach gives more weight

to residuals with wider dispersion than those with closer dispersion around the
line.

From Equation 2. 13, we know that Uˆ i = Yi − Yˆi .

29
∑Uˆ ∑ (Y ) ∑ (Y )
2 2
Thus, i
2
= i − Yˆi = i − βˆ0 − βˆ1 ……………………..2.14

It is clear from Equation 2.14 that

∑Uˆ i
2
( )
= f βˆ0 , βˆ1 ………………………………………..2.15

This implies that for any given set of data, choosing different values for
βˆ0 and βˆ1 will give different Uˆ ' s and hence different values of ∑Uˆ i
2
. This

implies that by assigning different values for βˆ0 and βˆ1 we will have different
regression lines (SRFs) for the same sample.

For example, if βˆ0 = 1.5 and βˆ1 = 1.3 , then SRF can be given as

SRF1: Yˆi = 1.5 + 1.3 X i

If, on the other hand, βˆ0 = 3 and βˆ1 = 1, then SRF can be given as

SRF2 : Yˆi = 3 + X i

Dear readers, which of these two lines do you think will give the best fit to the
observed data? Alternatively, which set of βˆ should be chosen? ---------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
----------------------

According to the least squares criterion the one that produces minimum values to

∑Uˆ i
2
must be chosen. This means, that to choose the best set of βˆ ' s , we will

assign many more values to βˆ ' s and see what may happen to ∑U i
2
. However,

30
in practice we may not have sufficient time and patience to conduct these trial
and error processes. Therefore, we need to look for some short cuts. Fortunately,
the method of least squares provides us such a short cut. The principle of least
squares chooses βˆ0 and βˆ1 in such a way that, for a given sample, ∑Uˆ i
2
is as

small as possible.

The mechanism of accomplishing this is straight forward by using differential


calculus. Now recall from Equation 2.14 that for n pairs of observations on Y and
X,

∑ (Y )
n n

∑Uˆ i2 =
2
i − βˆ0 − βˆ1 …………………. ……….2.14*
i = 1 i = 1

According to the principle of least squares, we have to minimize Equation 2.14


with respect to βˆ0 and βˆ1 . At the minimum point of ∑Uˆ i
2
, First Order

Conditions( FOC) must be satisfied, which are given as:


∂ ∑Uˆ i2
FOC : i ) = 0 …………………2.16
∂ βˆ 0

∂ ∑Uˆ i2
ii ) = 0 …………………….2.17
∂ βˆ 1

By applying the function of a function rule of differentiation to Equations 2.16 and


2.17 will yield:

1)
∂ ∑U i2
=
(
∂ ∑ Yi − βˆ0 − βˆ1 X i ) 2

=0
∂ βˆ0 ∂βˆ0

( )
= 2 ∑ Yi − βˆ0 − βˆ1 X i . (− 1) = 0

(
= − 2 ∑ Yi − βˆ0 − βˆ1 X i = 0 )

( )
= ∑ Yi − βˆ0 − βˆ1 X i = 0

31
= ∑ Yi − n βˆ0 − βˆ1 ∑X i = 0 …………….2.18

By interchanging the places of terms in Equation 2.18, we obtain:

⇒ ∑Y i = n βˆ0 + βˆ1 ∑X i …………………………………….2.19

Performing differentiation to Equation 2.17 yields the following result:


∂ ∑Uˆ i2
= 0
∂ βˆ
1


(
∂ ∑ Yi − βˆ0 − βˆ1 X i )
2

= 0
∂βˆ 1

(
⇒ 2 ∑ Yi − βˆ0 − βˆ1 X i ) (− X ) = 0
i

⇒ −2 ∑ (Y X
i i − βˆ0 X i − β1 X i2 = 0 )

⇒ ∑Y i X i − βˆ0 ∑X i − βˆ1 ∑X i
2
= 0

⇒ ∑Y i X i = βˆ0 ∑X i + βˆ1 ∑X i
2
................................. ………….2.20

Note that equations 2.19 and 2.20 are called normal equations of OLS.

Then, to develop formula to compute numerical values for βˆ0 and βˆ1 , we solve

these normal equations simultaneously by using Cramer’s rule:

⎡ ∑ Yi ⎤
Let A = ⎢ ⎥
⎣∑ Yi X i ⎦

32
⎡ n
B= ⎢
∑X i ⎤

⎣∑ X i ∑X 2
i ⎦

⎡ ∑ Yi ∑X ⎤
C = ⎢ i

⎣∑ Yi X i ∑X 2
i ⎦
⎡ n
D=⎢
∑ Y ⎤⎥ i

⎣∑ X i ∑Y X ⎦ i i

Then,
det er min ant of C
βˆ0 =
det er min ante of B

=
(∑ Y ) .(∑ X ) − ∑ x ∑ Y x
i i
2
i i i
………………………………………2.21
n ∑ x − (∑ x ) 2 2
i i

And,
det er min ante of D
βˆ1 =
det er min ante of B

n ∑ Yi xi − ∑ x ∑Y
= i i
………………………….……………………2.22
n ∑x − (∑ x )
2 2
i i

The estimators βˆ0 and βˆ1 obtained from this process are called the least

squares estimators, since they are developed via the least squares principle.

In passing, note that equations 2.21 and 2.22 can be expressed in deviation
forms as:

βˆ0 = Y − βˆ1 X ……………………………….……. ……2.23

And βˆ1 = ∑x y
i i
………………………………………………2.24
∑x 2
i

33
2.7 Estimation of a Function Whose Intercept is Zero
In some cases economic theory postulates relationships which have a zero
constant intercept. For example, linear production functions of manufactured
products should normally have zero intercept, since out put is zero when the
factor in puts are zero.

In this event, we would estimate the function


Yi = β 0 + β1 X i + U i , imposing the restriction β 0 = 0.

In this case we want to fit the line Y = β 0 + β1 X + U subject to β 0 = 0 .This is a

restricted minimization problem. Thus, we minimize:

∑Uˆ ∑ (Y )
2
i
2
= i − βˆ0 − βˆ1 X i

subject to
βˆ0 = 0

To solve this problem, we form a composite function called Lagrange function as


follows

(
L = ∑ Y − βˆ0 − βˆ1 X i )
2
− λ βˆ0 ………………………………………2.25

Where λ is called the Lagrange multiplier.

The values of βˆ0 andβˆ1 that maximize equation 2.25 can be obtained by taking
the partial differential of equation 2.25, which is given as follows;

1)
∂L = 2 ∑ (Y − βˆ 0 )
− βˆ1 X (− 1) − λ = 0
∂βˆ
0

=− 2 ∑ (Y − βˆ 0 )
− βˆ1 X − λ = 0 ………………………….……2.26

34
∑ (Y − βˆ ) (− X ) = 0
∂L
2) = 2 − βˆ1 X
∂βˆ
0
1

( )
= − 2 ∑ Y − βˆ0 − βˆ1 X ( X ) = 0 ……………………………..….2.27

∂L = − βˆ0 = 0
3)
∂λ

⇒ β 0 = 0 ………………………………………………….…………..2.28

Substituting equation 2.28 into equation 2.27, we get

(
⇒ 2 ∑ Y − βˆ1 X ) (x ) = 0

⇒ ∑Y X − βˆ1 ∑X 2
= 0

⇒ ∑ Y X = βˆ1 ∑X 2


⇒ β1 =
∑ YX ……………………………………………………2.28*
∑X 2

Note that the difference between equations 2.24 and 2.28* is that the former is in
deviation form while the latter involves actual values.

2.8 Functional Forms of Regression Models


So far we primarily dealt with models that are linear in parameter and in
variables. Now, in this section we will consider some commonly used regression
models that may be non–linear in variables but are linear in the parameters or
can be made so by suitable transformations of the variables.

35
2.8.1 The Leg – linear model
To illustrate this model, assume that you are given the so called exponential
regression model given as:

Yi = β 0 X iB1 eUi ……………………. …………………..2.29

Where e = 2.718, which is constant.

Then, taking the natural logarithm of equation 2.29, will yield

ln Yi = ln β 0 + β1 la X i U i ………………………………2.30

Note: Equation 2.30 is the result of the following properties:

i) ln( AB) = ln A + ln B

ii) ln A = ln A − ln B
B

( )
iii) ln AK = k ln A

Then, letting ln β 0 = α , equation 2.30 can be rewritten as:

ln Yi = α + β1 ln X i + ui ………………………….…………….2.31

From equation 2.31, it is clear that the model is linear in parameters α and β1 and
linear in the logarithms of the variables Y and X. Hence, it can be estimated by
OLS regression, which is suitable only for linear models. It is this linearity that
makes the model to be called as log-log or double log or log-linear model.

36
This model is very popular in applied work. This is mainly due the fact that the
slope coefficient (e.g. β1 in equation 2.31) measures the elasticity of Y with
respect to X. i.e., the percentage change in Y for a give (small) percentage
change in X.

Note: The log-linear model assumes that the elasticity coefficient between Y and
X, remains constant throughout. In other words it shows that the elasticity
remains the same no matter at which value of X we measure the elasticity. Due
to this reasons the model is also called constant elasticity model.

Activity
Let Y represent the quantity supplied of a commodity and X represent the
price of the commodity. Then, if you use equation 2.29 to model the
relationship between Y and X, interpret the coefficient of X?-----------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------

2.8.2 Semi-log Models


i) Log–Lin Models
Some times we may be interested in finding out the rate of growth of certain
economic variables. In this case we will use Log-Lin models. For example, if we
want to find out the growth rate of personal consumption expenditure on
services, this model will be applied.

To see how these models develop, let Yt denote real expenditure on services at
time t and Y0 the initial value of the expenditure on service. As you may recall the
formula of compound growth rate is given as

37
Yt = Y0 (1 + r ) ………………………………………..…….…..2.32
t

Where r is the compound rate of growth of Y.

Taking the natural logarithm of equation 2.32 will give:


ln Yt = ln Y0 + t ln (1 + r ) …………………………………….2.33

If we let ln Y0 = β 0 and ln (1 + r ) = β 1 , we will have

ln Yt = β 0 + β1 t ………………………………………………….2.34

To make equation 2.34 stochastic and develop econometric model, we add the
disturbance term. Then it becomes,

ln Yt = β 0 + β1 t + U t ………………………………………..2.35

As it can be seen from the above equation, the model is linear in the
parameters β 0 and β1 . The only difference in this model is that the regressand is

ln Y and the regressor is time, t . Models like equation 2.35 are called Semi-Log
models since it is only one variable that appears in logarithmic form.

Semi- log models are called Log-Lin models if the regressand is in logarithmic
form. In Log-Lin models, the slope coefficient measures a constant proportional
or relative change in Y for a given absolute change in the values of the regressor.
That means in equation 2.35,

Re lative change in a regressand


β2 = …………………………..2.36
Absolute change in a regressor

38
If we multiply the relative change in Y by 100, equation 2.36 will give the
percentage change or the growth rate in Y for an absolute change in the
explanatory variables i.e., 100 times β 2 gives the growth rate in Y and some
times it is called the semi-elasticity of Y with respect to the explanatory variable.

ii) The Lin–Log model


In this case we are interested in finding out the absolute change in Y for a
percentage change in explanatory variable, X. This model can be written as

Yi = β 0 + β1 ln xi + U i ……………………………………………….2.37

These types of models are known as Lin-Log models. In this case, β 2 is given
as:
Change in Y
β2 =
Change in ln X
Change in Y
=
Re lative change in X .
ΔY
=
ΔX
X
Equivalently,

ΔY = β 2 ΔX( X
) ………………….…………………. ….2.38

This equation states that the absolute change in Y (i.e. ΔY ) is equal to slope times
relative change in X. If the terms in the right had side of equation 2.38 is
multiplied by 100, then the equation gives the absolute change in Y for a

percentage change in X. Thus, if ΔX changes by 0.01 units (1 percent), the


X
absolute change in Y is 0.01 β 2 . That means, for example, if we find β 2 = 500,
then the absolute change in Y is (0.01) (500) = 5.

39
Thus, it is noteworthy that when equation 2.38 is estimated by OLS, the value of
the estimated slope coefficient must be multiplied by 0.01; otherwise your
interpretation will be misleading.

Generally, while the choice of a particular functional form may depend on the
underlying theory, it is a good practice use a model that enables us to find out the
rate of change of the dependent variable with respect to the explanatory variable
as well as the elasticity of the regressand with respect to the explanatory
variables.

Exercises
1. Why do we need regression analysis?
2. What is the difference between regression and correlation analysis?
3. What is the difference between the population and sample regression
functions?
4. What is the role of the stochastic error term U i in regression analysis?

5. Given the following two models:

Model I: Yi = β 0 + β1 X i + U i

Model II: Yi = α 0 + α1 ( X i − X ) + U i

i. Find the estimators of β0 and α0, and their variances. Explain


the differences between these estimators, if any.
ii. Find the estimators of β1 and α1, and their variances.
Explain the differences, if any.
iii. What is the advantage, if any, of model II over model I?

6. The following results are obtained from a sample of 11 observations on


the dependent variable (Y) and explanatory variable (X)

40
X = 520

Y = 220 ∑XY i i = 1290

∑X i
2
= 3100 ∑Y i
2
= 539,500

Based on the given information


a. Estimate the coefficients of regression line of Y on X
b. Interpret the coefficients of your model

7. Suppose in question #6 above, on rechecking the data it was found that


two pairs of observations were erroneously recorded as
Y X Y X
90 120 instead of 80 110
140 220 150 210

i. Find the OLS estimates of the coefficients of the regression line of Y on X.


ii. Explain the effect of the data recording error on the estimates of the
coefficients of the regression model in question #6.

41

You might also like