BHU CoBE DEPARTMENT OF ECONOMICS
Chapter Three
THE CLASSICAL REGRESSION ANALYSIS
[The Multiple Linear Regression Model]
Introduction
In simple regression we study the relationship between a dependent variable and a single
explanatory (independent variable). But it is rarely the case that economic relationships
involve just two variables. Rather a dependent variable Y can depend on a whole series
of explanatory variables or regressors. For instance, in demand studies we study the
relationship between quantity demanded of a good and price of the good, price of
substitute goods and the consumer’s income. The model we assume is:
Y i =β 0 + β 1 P1 + β 2 P2 + β3 X i +ui -------------------- (3.1)
Where Y i = quantity demanded, P1 is price of the good, P2 is price of substitute goods, Xi
is consumer’s income, and β ' s are unknown parameters and ui is the disturbance.
Equation (3.1) is a multiple regression with three explanatory variables. In general for K-
explanatory variable we can write the model as follows:
Y i =β 0 +β 1 X 1 i +β 2 X 2 i +β 3 X 3 i +.. . .. .. . .+ β k X ki +ui ------- (3.2)
X k =( i=1 , 2 ,3 , .. . .. .. , K )
Where i are explanatory variables, Yi is the dependent variable and
β j ( j=0 ,1 ,2 ,. .. .(k+1 )) are unknown parameters and ui is the disturbance term. The
disturbance term is of similar nature to that in simple regression, reflecting:
- the basic random nature of human responses
- errors of aggregation
- errors of measurement
ECONOMETRICS I LECTURE NOTE 2019 Page 1
BHU CoBE DEPARTMENT OF ECONOMICS
- Errors in specification of the mathematical form of the model and any other
(minor) factors, other than x i that might influence Y.
In this chapter we will first start our discussion with the assumptions of the multiple
regressions and we will proceed our analysis with the case of two explanatory variables.
Assumptions of Multiple Regression Model
In order to specify our multiple linear regression model and proceed our analysis with
regard to this model, some assumptions are compulsory. But these assumptions are the
same as in the single explanatory variable model developed earlier except the assumption
of no perfect multicollinearity. These assumptions are:
1. Randomness of the error term: The variable u is a real random variable.
2. Zero mean of the error term: E(u i )=0
3. Hemoscedasticity: The variance of each ui is the same for all the x i values.
E( u 2 )=σ 2
i.e. i u (constant)
4. Normality of u: The values of each ui are normally distributed.
2
i.e.U i ~ N (0 ,σ )
5. No auto or serial correlation: The values of ui (corresponding to Xi ) are
independent from the values of any other ui (corresponding to Xj) for i j.
i.e. E(u i u j )=0 for x i≠ j
6. Independence of ui and Xi:Every disturbance term ui is independent of the
explanatory variables. i.e. E(u i X 1i )=E(u i X 2i )=0
This condition is automatically fulfilled if we assume that the values of the X’s are
a set of fixed numbers in all (hypothetical) samples.
7. No perfect multicollinearity: The explanatory variables are not perfectly linearly
correlated.
We can’t exclusively list all the assumptions but the above assumptions are some of the
basic assumptions that enable us to proceed our analysis.
ECONOMETRICS I LECTURE NOTE 2019 Page 2
BHU CoBE DEPARTMENT OF ECONOMICS
A Model with Two Explanatory Variables
In order to understand the nature of multiple regression model easily, we start our
analysis with the case of two explanatory variables, then extend this to the case of k-
explanatory variables.
3.3.1 Estimation of parameters of two-explanatory variables model
The model: Y = β0 + β 1 X 1 + β 2 X 2 +U i ……………………………………(3.3)
is multiple regression with two explanatory variables. The expected value of the above
model is called population regression equation i.e.
E(Y )=β 0 + β1 X 1 + β 2 X 2 , Since E(U i )=0 . …………………................(3.4)
where β i is the population parameters. β 0 is referred to as the intercept and β 1 and β 2
are also sometimes known as regression slopes of the regression. Note that, β 2 for
example measures the effect on E(Y ) of a unit change in X 2 when X 1 is held constant.
Since the population regression equation is unknown to any investigator, it has to be
estimated from sample data. Let us suppose that the sample data has been used to
estimate the population regression equation. We leave the method of estimation
unspecified for the present and merely assume that equation (3.4) has been estimated by
sample regression equation, which we write as:
Y^ = β^ 0 + β^ 1 X 1 + β^ 2 X 2 ……………………………………………….(3.5)
^
Where β j are estimates of the β j andY^ is known as the predicted value of Y.
Now it is time to state how (3.3) is estimated. Given sample observation on Y , X 1 ∧X 2 ,
we estimate (3.3) using the method of least square (OLS).
Y^ = β^ 0 + β^ 1 X 1i + β^ 2 X 2i + ei ……………………………………….(3.6)
is sample relation between Y , X 1 ∧X 2 .
e i=Y i−Y^ =Y i− β^ 0 − β^ 1 X 1 − β^ 2 X 2 …………………………………..(3.7)
ECONOMETRICS I LECTURE NOTE 2019 Page 3
BHU CoBE DEPARTMENT OF ECONOMICS
To obtain expressions for the least square estimators, we partially differentiate ∑ e2i with
^ ^ ^
respect to β 0 , β 1 and β 2 and set the partial derivatives equal to zero.
∂ [ ∑ e2i ]
=−2 ∑ (Y i − β^ 0 − β^ 1 X 1i − β^ 2 X 2 i )=0
∂ β^0 ………………………. (3.8)
∂ [ ∑ e2i ]
=−2 ∑ X 1 i ( Y i − β^ 0− β^ 1 X 1i− β^ 1 X 1i )=0
∂ β^
1 ……………………. (3.9)
∂ [ ∑ e2i ]
=−2 ∑ X 2 i ( Y i − β^ 0− β^ 1 X 1i − β^ 2 X 2i )=0
^
∂β 2 ………… ………..(3.10)
Summing from 1 to n, the multiple regression equation produces three Normal
Equations:
∑ Y =n β^ 0+ β^ 1 ΣX1 i + β^ 2 ΣX2 i …………………………………….(3.11)
∑ X 2i Y i = β^ 0 ΣX1 i + β^ 1 ΣX 21i + β^ 2 ΣX 2i X 1i …………………………(3.12)
∑ X 2i Y i = β^ 0 ΣX 2 i + β^ 1 ΣX 1 i X 2i + β^ 2 ΣX 22i ………………………...(3.13)
^
From (3.11) we obtain β 0
β^ 0 =Ȳ − β^ 1 X̄ 1 − β^ 2 X̄ 2 ------------------------------------------------- (3.14)
Substituting (3.14) in (3.12) , we get:
∑ X 1i Y i =( Ȳ − β^ 1 X̄ 1− β^ 2 X̄ 2 ) ΣX 1 i + β^ 1 ΣX 1 i2+ β^ 2 ΣX 2i
X 1i Y i −Y^ ΣX 1i = β^ 1 ( ΣX − X̄ 1 ΣX 2i )+ β^ 2 ( ΣX 1 i X 2i − X̄ 2 ΣX 2i )
⇒∑ 1 i2
X 1i Y i −n Ȳ X̄ 1i = β^ 2 ( ΣX )+ β^ 2 ( ΣX 1 i X 2 −n X̄ 1 X̄ 2 )
⇒∑
−n X̄
1i2 1 i2 ------- (3.15)
We know that
∑ ( X i −Y i )2=(ΣX i Y i−n X̄ i Y i )=Σxi y i
∑ ( X i − X̄ i ) 2=( ΣX i2 −n X̄ i2 )=Σxi2
Substituting the above equations in equation (3.14), the normal equation (3.12) can be
written in deviation form as follows:
ECONOMETRICS I LECTURE NOTE 2019 Page 4
BHU CoBE DEPARTMENT OF ECONOMICS
∑ x 1 y = β^ 1 Σx12+ β^ 2 Σx1 x 2 …………………………………………(3.16)
Using the above procedure if we substitute (3.14) in (3.13), we get
∑ x 2 y = β^ 1 Σx 1 x 2+ β^ 2 Σx 22 ………………………………………..(3.17)
Let’s bring (2.17) and (2.18) together
∑ x 1 y = β^ 1 Σx12+ β^ 2 Σx1 x 2 ……………………………………….(3.18)
∑ x 2 y = β^ 1 Σx 1 x 2+ β^ 2 Σx 22 ……………………………………….(3.19)
β^ 1 and β
^
2 can easily be solved using matrix
We can rewrite the above two equations in matrix form as follows.
∑ x 1 2 ∑ x 1 x 2 β^ 1 = ∑ x2 y ………….(3.20)
∑ x1 x2 ∑ x 2
2
^
β 2 ∑ x3 y
If we use Cramer’s rule to solve the above matrix we obtain
Σx 1 y . Σx 2−Σx 1 x 2 . Σx 2 y
β^ 1=
2
Σx 2 . Σx 2 −Σ ( x 1 x2 )2
1 2 …………………………..…………….. (3.21)
Σx 2 y . Σx 2−Σx 1 x 2 . Σx 1 y
β^ 2 =
1
Σx 2 . Σx 2 −Σ ( x 1 x 2 )2
1 2 ………………….……………………… (3.22)
3.3.2 The coefficient of determination ( R2):two explanatory variables case
In the simple regression model, we introduced R2 as a measure of the proportion of
variation in the dependent variable that is explained by variation in the explanatory
variable. In multiple regression model the same measure is relevant, and the same
formulas are valid but now we talk of the proportion of variation in the dependent
variable explained by all explanatory variables included in the model. The coefficient of
determination is:
Σe 2
ESS RSS i
R2 = =1− =1−
TSS TSS Σy 2
i ------------------------------------- (3.25)
In the present model of two explanatory variables:
ECONOMETRICS I LECTURE NOTE 2019 Page 5
BHU CoBE DEPARTMENT OF ECONOMICS
Σe2i =Σ ( y i− β^ 1 x 1i − β^ 2 x 2 i )2
=Σe i ( y i− β^ 1 x 1i − β^ 2 x 2i )
=Σe i y − β^ 1 Σx 1i e i− β^ 2 Σei x 2i
=Σe i y i since Σei x 1i=Σei x 2i =0
=Σy i ( y i − β^ 1 x 1 i− β^ 2 x 2i )
i. e Σe2i =Σy2 − β^ 1 Σx 1i y i − β^ 2 Σx 2i y i
otalsumof ¿ var iation) ¿¿=β^ 1 Σx1i yi+ β^ 2 Σx2i yi underbracealignl ⏟
⇒Σy2underbracealignlT⏟ Explained sumof ¿ var iation) ¿¿¿+Σe 2 underbracealignl⏟
Residualsumof squares ¿ ¿¿¿¿
i
square (Total ¿ square (Explained ¿ (unexplained var iation)¿ ----------------- (3.26)
ESS β^ 1 Σx1i y i + β^ 2 Σx2 i y i
∴ R2 = =
TSS Σy 2 ----------------------------------(3.27)
As in simple regression, R2 is also viewed as a measure of the prediction ability of the
model over the sample period, or as a measure of how well the estimated regression fits
the data. The value of R2 is also equal to the squared sample correlation coefficient
^
between Y ∧Y t . Since the sample correlation coefficient measures the linear association
between two variables, if R2 is high, that means there is a close association between the
^
values of Y t and the values of predicted by the model, Y t . In this case, the model is said to
“fit” the data well. If R2 is low, there is no association between the values of Y t and the
^
values predicted by the model, Y t and the model does not fit the data well.
2
3.3.3 Adjusted Coefficient of Determination ( R̄ )
2
One difficulty with R is that it can be made large by adding more and more variables,
even if the variables added have no economic justification. Algebraically, it is the fact
that as the variables are added the sum of squared errors (RSS) goes down (it can remain
2
unchanged, but this is rare) and thus R goes up. If the model contains n-1 variables then
R2 =1. The manipulation of model just to obtain a high R2 is not wise. An alternative
ECONOMETRICS I LECTURE NOTE 2019 Page 6
BHU CoBE DEPARTMENT OF ECONOMICS
2 2
measure of goodness of fit, called the adjusted R and often symbolized as R̄ , is usually
reported by regression programs. It is computed as:
Σe2i / n−k
2
R̄ =1− 2
Σy / n−1
=1−( 1−R 2 ) ( n−k
n−1
)--------------------------------(3.28)
This measure does not always goes up when a variable is added because of the degree of
freedom term n-k is the numerator. As the number of variables k increases, RSS goes
2 2
down, but so does n-k. The effect on R̄ depends on the amount by which R falls.
While solving one problem, this corrected measure of goodness of fit unfortunately
2
introduces another one. It losses its interpretation; R̄ is no longer the percent of
2
variation explained. This modified R̄ is sometimes used and misused as a device for
selecting the appropriate set of explanatory variables.
Hypothesis Testing in Multiple Regression Model
In multiple regression models we will undertake two tests of significance. One is
significance of individual parameters of the model. This test of significance is the same
as the tests discussed in simple regression model. The second test is overall significance
of the model.
Tests of individual significance
2
If we invoke the assumption that U i ~. N (0 , σ ) , then we can use either the t-test or
standard error test to test a hypothesis about any individual partial regression coefficient.
To illustrate consider the following example.
^ ^ ^
Let Y = β0 + β 1 X 1 + β 2 X 2 +e i ………………………………… (3.51)
A. H 0 : β 1 =0
H 1 : β 1 ≠0
B. H 0 : β 2 =0
H 1 : β 2 ≠0
ECONOMETRICS I LECTURE NOTE 2019 Page 7
BHU CoBE DEPARTMENT OF ECONOMICS
The null hypothesis (A) states that, holding X2 constant X1 has no (linear) influence on
Y. Similarly hypothesis (B) states that holding X1 constant, X2 has no influence on the
dependent variable Yi.To test these null hypothesis we will use the following tests:
i- Standard error test: under this and the following testing methods we test only
^ ^
for β 1 .The test for β 2 will be done in the same way.
SE( β^ 1 )=√ var( β^ 1 )=
^
√
1 ^
σ^ 2 ∑ x 22i
∑ x12i ∑ x 22i −( ∑ x 1 x 2)2 ; where
2
σ^ =
Σe2i
n−3
If SE( β 1 )> 2 β1 , we accept the null hypothesis that is, we can conclude that
the estimate β i is not statistically significant.
^ 1 ^
If SE( β 1 < 2 β 1 , we reject the null hypothesis that is, we can conclude that
the estimate β i is statistically significant.
Note: The smaller the standard errors, the stronger the evidence that the estimates are
statistically reliable.
^
ii. The student’s t-test: We compute the t-ratio for each β i
β^ i−β
t∗¿ ~ t n-k
SE ( β^ ) i , where n is number of observation and k is number of
parameters. If we have 3 parameters, the degree of freedom will be n-3. So;
β^ 2 −β2
t∗¿
SE ( β^ ) 2 ; with n-3 degree of freedom
In our null hypothesis β 2 =0 , the t* becomes:
β^ 2
t∗¿
SE ( β^ 2 )
^
If t*<t (tabulated), we accept the null hypothesis, i.e. we can conclude that β 2
is not significant and hence the regressor does not appear to contribute to the
explanation of the variations in Y.
ECONOMETRICS I LECTURE NOTE 2019 Page 8
BHU CoBE DEPARTMENT OF ECONOMICS
If t*>t (tabulated), we reject the null hypothesis and we accept the alternative
^
one; β 2 is statistically significant. Thus, the greater the value of t* the stronger
the evidence that β i is statistically significant.
Test of Overall Significance
Through out the previous section we were concerned with testing the significance of the
estimated partial regression coefficients individually, i.e. under the separate hypothesis
that each of the true population partial regression coefficient was zero.
In this section we extend this idea to joint test of the relevance of all the included
explanatory variables. Now consider the following:
Y = β0 +β 1 X 1 +β 2 X 2 +. .. .. . .. .+β k X k +U i
H 0 : β 1 =β 2=β 3 =. .. .. . .. .. . .=β k =0
H 1 : at least one of the β k is non-zero
This null hypothesis is a joint hypothesis that β 1 , β 2 ,........ β k are jointly or simultaneously
equal to zero. A test of such a hypothesis is called a test of overall significance of the
observed or estimated regression line, that is, whether Y is linearly related to
X 1 , X 2 ,. ..... .. X k .
Can the joint hypothesis be tested by testing the significance of individual significance of
^
β i ’s as the above? The answer is no, and the reasoning is as follows.
In testing the individual significance of an observed partial regression coefficient, we
assumed implicitly that each test of significance was based on different (i.e. independent)
^
sample. Thus, in testing the significance of β 2 under the hypothesis that β 2 =0 , it was
assumed tacitly that the testing was based on different sample from the one used in
^
testing the significance of β 3 under the null hypothesis that β 3 =0 . But to test the joint
hypothesis of the above, we shall be violating the assumption underlying the test
procedure.
ECONOMETRICS I LECTURE NOTE 2019 Page 9
BHU CoBE DEPARTMENT OF ECONOMICS
“…..testing a series of single (individual) hypothesis is not equivalent to testing those
same hypothesis. The institutive reason for this is that in a joint test of several
hypotheses any single hypothesis is affected by the information in the other hypothesis.”1
The test procedure for any set of hypothesis can be based on a comparison of the sum of
squared errors from the original, the unrestricted multiple regression model to the sum
of squared errors from a regression model in which the null hypothesis is assumed to
be true. When a null hypothesis is assumed to be true, we in effect place conditions or
constraints, on the values that the parameters can take, and the sum of squared errors
increases. The idea of the test is that if these sum of squared errors are substantially
different, then the assumption that the joint null hypothesis is true has significantly
reduced the ability of the model to fit the data, and the data do not support the null
hypothesis.
If the null hypothesis is true, we expect that the data are compliable with the conditions
placed on the parameters. Thus, there would be little change in the sum of squared errors
when the null hypothesis is assumed to be true.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors in the
model obtained by assuming that the null hypothesis is true and URSS be the sum of the
squared error of the original unrestricted model i.e. unrestricted residual sum of square
(URSS). It is always true that RRSS - URSS¿ 0.
^ ^ ^ ^ ^
Consider Y = β0 + β 1 X 1 + β 2 X 2 +. .. .. . .. .+ β k X k +e i .
This model is called unrestricted. The test of joint hypothesis is that:
H 0 : β 1 =β 2=β 3 =. .. .. . .. .. . .=β k =0
H 1 : at least one of the β k is different from zero.
^ ^ ^ ^ ^
We know that: Y = β0 + β 1 X 1i + β 2 X 2i +.. .. .. . ..+ β k X ki
Y i =Y^ +e
ECONOMETRICS I LECTURE NOTE 2019 Page 10
BHU CoBE DEPARTMENT OF ECONOMICS
e i=Y i−Y^ i
Σe2i =Σ (Y i−Y^ i )2
This sum of squared error is called unrestricted residual sum of square (URSS). This is
the case when the null hypothesis is not true. If the null hypothesis is assumed to be true,
i.e. when all the slope coefficients are zero.
Y = β^ 0 +e i
β^ 0 =
∑ Y i =Ȳ →
n (applying OLS)…………………………….(3.52)
e=Y − β^ 0 ^
but β 0 =Ȳ
e=Y − β^
Σe2i =Σ (Y i−Y^ i )2 =Σy2 =TSS
The sum of squared error when the null hypothesis is assumed to be true is called
Restricted Residual Sum of Square (RRSS) and this is equal to the total sum of square
(TSS).
RRSS−URSS / K−1
~ F (k −1, n−k )
The ratio: URSS /n−K ……………………… (3.53); (hasan F-
ditribution with k-1 and n-k degrees of freedom for the numerator and denominator respectively)
RRSS=TSS
URSS=Σe2i =Σy2 − β^ 1 Σ yx 1 − β^ 2 Σ yx 2 +.. .. . .. .. . β^ k Σ yx k =RSS
(TSS−RSS)/ k−1
F=
RSS /n−k
ESS/ k −1
F=
RSS / n−k ………………………………………………. (3.54)
2
If we divide the above numerator and denominator by Σy =TSS then:
ESS
/k −1
TSS
F=
RSS
/k −n
TSS
ECONOMETRICS I LECTURE NOTE 2019 Page 11
BHU CoBE DEPARTMENT OF ECONOMICS
R 2 / k −1
F=
1−R2 / n−k …………………………………………..(3.55)
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or
R2& 1-R2. If the null hypothesis is not true, then the difference between RRSS and URSS
(TSS & RSS) becomes large, implying that the constraints placed on the model by the
null hypothesis have large effect on the ability of the model to fit the data, and the value
of F tends to be large. Thus, we reject the null hypothesis if the F test static becomes too
large. This value is compared with the critical value of F which leaves the probability of
α in the upper tail of the F-distribution with k-1 and n-k degree of freedom.
If the computed value of F is greater than the critical value of F (k-1, n-k), then the
parameters of the model are jointly significant or the dependent variable Y is linearly
related to the independent variables included in the model.
ECONOMETRICS I LECTURE NOTE 2019 Page 12