ECONOMETRICS I
[Econ 3061]
Madda Walabu University
College of Business and Economics
Department of Economics
Genemo Fitala (MSc.)
[email protected] 2
CHAPTER 4
VIOLATION OF CLASSICAL REGRESSION ASSUMPTIONS
Review of Assumption of the classical
3
regression/OLS
1. The regression model is linear, correctly specified, and has an additive error
term.
2. The error term has a zero population mean.
3. All explanatory variables are uncorrelated with the error term
4. Observations of the error term are uncorrelated with each other (no serial
correlation).
5. The error term has a constant variance (no heteroskedasticity).
6. No explanatory variable is a perfect linear function of any other explanatory
variables (no perfect multicollinearity).
7. The error term is normally distributed (not required).
4.1. Multicollinearity 4
Multicollinearity denotes the linear relationship among independent or
explanatory variables.
An implicit assumption that is made when using the OLS estimation method is
that the explanatory variables are not correlated with one another.
If there is no relationship between the explanatory variables, they would be said to be
orthogonal to one another.
If the explanatory variables were orthogonal to one another, adding or removing a
variable from a regression equation would not cause the values of the coefficients on
the other variables to change.
In any practical context, the correlation between explanatory variables will be non-
zero - a small degree of association between explanatory variables will almost always
occur but will not cause too much loss of precision.
Cont’d 5
However, a problem occurs when the explanatory variables are very highly correlated
with each other, and this problem is known as multicollinearity. It is possible to
distinguish between two classes of multicollinearity: perfect & near multicollinearity
1.Perfect Collinearity
Perfect multicollinearity occurs when there is an exact relationship between two or
more variables. In this case, it is not possible to estimate all of the coefficients.
It usually observed only when the same IV is inadvertently used twice in a regression.
For illustration, suppose that two variables were employed in a regression function
such that the value of one variable was always twice that of the other (e.g.
suppose 𝑋3 = 2 ∗ 𝑋2 ). If both 𝑋3 and 𝑋2 were used as explanatory variables in the
same regression, then the model parameters cannot be estimated.
If explanatory variables have perfectly correlated i.e if correlation coefficient is
one, then the parameters become indeterminate and & their S.Es are infinite.
Cont’d
6
2. Near Multicollinearity:
Near multicollinearity is much more likely to occur in practice, and would arise when
there was a non-negligible, but not perfect, relationship between two or more of the
explanatory variables.
If multicollinearity is less than perfect, the regression coefficients are determinate
but have large standard errors which harms the accuracy of the estimation of the
coefficients. In case, OLS estimators have large variance though they are not BLUE.
Note that a high correlation between the dependent variable and one of the
independent variables is not multicollinearity.
Testing for multicollinearity is surprisingly difficult, and hence all that is presented
here is a simple method to investigate the presence or otherwise of the most easily
detected forms of near multicollinearity.
Cont’d
7
This method simply involves looking at the matrix of correlations
between the individual variables.
Suppose that a regression equation has three explanatory variables (plus
a constant term), and that the pair-wise correlations between these
explanatory variables are:
Clearly, if multicollinearity was suspected, the most likely culprit would be a
high correlation between 𝑋2 𝑎𝑛𝑑 𝑋4 .
1.The potential source of multicollinearity 8
1. The data collection method employed: Sampling over limited range of the
values taken by the repressors in the population.
2. Constraint on the model or in the population being sampled.
3. Model specification
Adding polynomial terms to a regression
Use of highly dependent independent variables
Use of many interaction terms in a model
4. Over-determined model: the model has more explanatory variables than
number of observation,
5. Use of many dummy independent variables
2. The Detection of Multicollinearity 9
1. High Correlation Coefficients: Pairwise correlations among independent
variables might be high (in absolute value). The Rule of thumb is If the
correlation > 0.8 then severe multicollinearity may be present.
2. High 𝑹𝟐 with low t-Statistic Values : Possible for individual regression
coefficients to be insignificant but for the overall fit of the equation to be high.
3. High Variance Inflation Factors (VIFs):
The larger the value of VIF, the more ―troublesome or collinear the variable Xj
If the VIF of a variable exceeds 10, which will happen if 𝑹𝟐 exceeds 0.90,
that variable is said to be highly collinear
The closer the Variance Inflation Factors (VIFs)j is to 1, the greater the
evidence that Xj is not collinear with the other regression.
3.Remedies for Multicollinearity 10
No single solution exists that would eliminate multicollinearity. Certain
approaches may be useful:
1. Do Nothing: Live with what you have.
2. Drop a Redundant Variable
If a variable is redundant, it should have never been included in the model
in the first place.
So dropping it actually is just correcting for a specification error.
Use economic theory to guide your choice of which variable to drop.
Cont’d 11
3. Transform the Multicollinear Variables
Sometimes you can reduce multicollinearity by re-specifying the model, for
instance, create a combination of the multicollinear variables.
As an example, rather than including the variables GDP and population in
the model, include GDP/population (GDP per capita) instead.
4. Increase the Sample Size
Increasing the sample size improves the precision of an estimator and
reduces the adverse effects of multicollinearity. Usually adding data though
is not feasible.
It should have never been included in the model in the first place.
4.2. Heteroscedasticity 12
1. The Nature of Heteroscedasticity
If the probability distribution of Ui remains the same over all explanatory
variables this assumptions is called homoscedasticity i.e. 𝑣𝑎𝑟 𝑢𝑖 = 𝛔𝟐𝐮 constant
Variance.
In this case the variation of 𝒖𝒊 around the explanatory variables is remains
constant.
But if the distribution of 𝑢𝑖 around the explanatory is not constant we say that 𝑢𝑖’s
are heteroscedastic (not constant variance
𝑣𝑎𝑟 𝑢𝑖 = 𝛔𝟐𝒖𝒊 signifies the fact that the individual variance may be different.
Cont’d
13
The assumption of homoscedasticity states that the variation of each random
term (Ui) around its zero mean is constant and does not change as the explanatory
variables change whether the sample size is increasing, decreasing or remains the
same it will not affect the variance of Ui which is constant.
𝑉𝑎𝑟 𝑢𝑖 = 𝛔𝟐𝒖 ≠ 𝑓(𝑋𝑖)
This explains that the variation of the random term around its mean does not
depend upon the DV Xi. This is called homoscedastic (constant Variance)
But the dispersion of the random term around the regression line may not be constant or
the variance of the random term Ui may be a function of the explanatory variables.
𝑉𝑎𝑟 𝑢𝑖 = 𝛔𝟐𝒖𝒊 = 𝑓 𝑋𝑖 , here it signifies the individual variance may all be different. This is
called heteroscedasticity or not constant variance.
Cont’d
14
Diagrammatically, in the two-variable regression FIGURE 4.1 Heteroscedastic
disturbances.
model homoscedasticity can be, for convenience,
shown as in Figure 4.1
As Figure 4.1 shows, the conditional variance of 𝑌𝑖
(which is equal to that of 𝑢𝑖 ), conditional upon the
given 𝑋𝑖 , remains the same regardless of the
values taken by the variable 𝑿.
In contrast, consider Figure 4.2, which shows that
the conditional variance of 𝑌𝑖 increases as 𝑿
increases. Here, the variances of Yi are not the
same. Hence, there is heteroscedasticity.
Symbolically, E (𝑢𝑖 2 ) = 𝛔𝟐𝒊 of the random term
around its mean does not depend upon the DV
FIGURE 4.2 Heteroscedastic disturbances.
Cont’d
15
To make the difference between FIGURE 4.1 Heteroscedastic
disturbances.
homoscedasticity and heteroscedasticity clear,
assume that in the two-variable model 𝑌𝑖 = 𝛽1 +
𝛽2 + 𝑢𝑖 , 𝒀 represents savings and 𝑿 represents
income.
Figures 4.1 and 4.2 show that as income
increases, savings on the average also increase.
But in Figure 4.1 the variance of savings
remains the same at all levels of income,
whereas in Figure 4.2 it increases with income.
It seems that in Figure 4.2 the higher-income
families on the average save more than the
lower-income families, but there is also more
variability in their savings. FIGURE 4.2 Heteroscedastic disturbances.
2. Source of heteroskedasticity
16
FIGURE 4.3 Illustration of
1. Following the error-learning models heteroscedasticity.
Aspeople learn, their errors of behavior become
smaller over time or the number of errors becomes
more consistent.
In this case, 𝛔𝟐𝒊 is expected to decrease.
Asan example, consider Figure 4.3, which relates the
number of typing errors made in a given time period
on a test to the hours put in typing practice.
As Figure 4.3 shows, as the number of hours of
typing practice increases, the average number of
typing errors as well as their variances decreases.
Cont’d
17
2. As the data collection techniques improves, 𝛔𝟐𝒖𝒊 is likely to decrease.
Example:
Banks that have sophisticated data processing equipment are likely to
commit fewer errors in the monthly or quarterly statements of their
customers than banks with out such facilities.
3. Presence of outliers:
An outlier observation is an observation that is much different either very
small or very large in relation to other observation in the sample.
3. Consequence of heteroskedasticity 18
If the assumption of homoscedasticity is violated, it will have the following
consequences
1. Heteroskedasticity increases the variances of the distributions of the
coefficients of OLS thereby turning the OLS estimators inefficient.
2. OLS estimators shall be inefficient: If the random term Ui is
heteroskedasticity, the OLS estimates do not have the minimum variance in
the class of unbiased estimators. Therefore they are not efficient both in
small & large samples.
So, heteroskedasticity has a wide impact on hypothesis testing; the
conventional t and F statistics are no more reliable for hypothesis testing.
Cont’d 19
Then 𝒗𝒂𝒓(𝜷 ) under heteroscedasticty will be greater than its variance under
homoscedasticity.
As a result the true standard error of(𝜷 ) shall be underestimated.
As such the t-value associated with it will be over estimated which might lead
to the conclusion that in a specific case at hand 𝜷 is statistically significant
(which in fact may not be true).
Moreover, if we proceed with our model under false belief of homoscedasticity
of the error variance, our inference and prediction about the population
coefficients would be incorrect.
4. Detection of heteroskedasticity 20
There are informal & formal methods of detecting heteroskedasticity
1. Informal
A. Nature of Problems: Following the pioneering work from empirical information
B. Graphical Method
Plotting the square of the residual against the dependent variable gives rough
indication of the existence of heteroskedasticity.
If there appears a systematic trend in the graph it may be an indication of
the existence of heteroskedasticity.
Cont’d 21
In Figure 4.4., 𝒖𝟐𝒊 are plotted against 𝒀𝒊 , the
estimated Yi from the regression line, the idea
being to find out whether the estimated mean
value of Y is systematically related to the squared
residual.
In Figure 4.4a we see that there is no systematic
pattern between the two variables, suggesting
that perhaps no heteroscedasticity is present in
the data.
Figures 4.4b to e, however, exhibit definite
patterns. For instance, Figure 4.4c suggests a
linear relationship, whereas Figures 4.4d and e
indicate a quadratic relationship between 𝒖𝟐𝒊 and 𝑌𝑖 .
Figure 4.4. Hypothetical patterns of estimated
squared residuals.
Cont’d 22
2. Formal Methods
A. The spearman rank- correlation Test
This is the simplest & approximate test for defecting hetroscedastic which
will be applied either to small or large samples.
A high rank correlation coefficient suggests the presence of
heteroskedasticity.
If we have more explanatory variable we may compute the rank correlation
coefficient between ei & each one of the explanatory variables separately.
Cont’d 23
B. The Breusch-Pagan Test
This test is applicable for large samples & the number of observations (at least)
i.e sample size is twice the number of explanatory variables.
If the numbers of explanatory variables are 3(X1, X2, X3) then the sample size
is at least must be 6.
If the computed value is greater than the table value rejects the null
hypothesis that there is homoscedasticity & accepts the alternative that there
is heteroskedasticity.
Cont’d
24
C. White’s General Heteroskedasticity Test
It is an LM test, but it has the advantage that it does not require any prior
knowledge about the pattern of heteroskedasticity.
The assumption of normality is also not required here. For all these reasons, it
is considered to be more powerful among the tests of heteroskedasticity.
Basic intuition → focuses on systematic patterns between the residual
variance, the explanatory variables, the squares of explanatory variables and
their cross-products.
Cont’d 25
Limitations of White’s test
i. When we have large number of explanatory variables, the number of terms in
the auxiliary regression model will be so high that we may not have adequate
degrees of freedom.
ii. It is basically a large sample test so that, when we work with a small sample, it
may fail to detect the presence of heteroskedasticity in data even when such a
problem is present.
Cont’d 26
D. Goldfeld-Quandt Test
This popular method is applicable if one assumes that the heteroscedastic
variance 𝛔𝟐𝒊 is positively related to one of the explanatory variables in the
regression model
It may be applied when one of the explanatory variables is suspected to be the
heteroskedasticity culprit.
The basic idea here is that if the variances of the disturbances are same across
all observations (i.e., homoscedastic), then the variance of one part of the
sample should be same with the variance of another part of the sample.
5. Remedies of heteroskedasticity 27
1. Log-transformation of data ⟶ log-transformation compresses the scales in
which the variables are measured. So it helps to reduce intensity of
heteroskedasticity problem. Obviously, this method cannot be followed where
some variables take on zero or negative values.
2. Using some suitable deflator if available to transform the data series ⟶ the
idea is to estimate the model by using the deflated variables so that more
efficient estimates of the parameters are obtained. But this process might lead
to ‘spurious relationship’ between the variables when a common deflator is used
to deflate the variables.
3. When heteroskedasticity appears owing to presence of outliers, increasing sample
size might be helpful.
4.3. Autocorrelation 28
o Inour discussion of simple and multiple regression models, one of the assumptions
of the classicalist is that the 𝐶𝑜𝑣(𝑢𝑖 𝑢𝑗 ) = 𝐸(𝑢𝑖 𝑢𝑗 ) = 0 which implies that successive
values of disturbance term U are temporarily independent, i.e. disturbance
occurring at one point of observation is not related to any other disturbance.
o Thismeans that when observations are made over time, the effect of disturbance
occurring at one period does not carry over into another period.
o Ifthe above assumption is not satisfied, that is, if the value of U in any
particular period is correlated with its own preceding value(s), we say there is
autocorrelation of the random variables.
Cont’d 29
Hence, autocorrelation is defined as a ‘correlation’ between members of series of
observations ordered in time or space.
Autocorrelation refers to the internal correlation between members of series of
observation ordered in time or space.
Autocorrelation is a special case of correlation where the association is not
between elements of two or more variables but between successive value of one
variable, While correlation refers to the relation ship between values of two or
more different variables.
Autocorrelation is also sometimes called as serial correlation but some economists
distinguish between these two terms.
1. Reasons/sources of Autocorrelation 30
There are several reasons why autocorrelation arises. Some of these are
A. Cyclical fluctuations
Time series such as GNP, price index, production, employment and
unemployment exhibit business cycle. Starting at the bottom of recession,
when economic recovery starts, most of these series move upward. In this
upswing, the value of a series at one point in time is greater than its
previous value.
Thus, there is a momentum built in to them, and it continues until something
happens (e.g. increase in interest rate or tax) to slowdown them.
Therefore, regression involving time series data, successive observations
are likely to be interdependent.
Cont’d 31
B. Omitted Explanatory variables
If an autocorrelated variable has been excluded from the set of
explanatory variables then its influence will be reflected in Ui.
If several auto correlated explanatory variables are omitted, then the
random variables ,U, may not be auto correlated. This is because the auto
correlation pattern of the omitted variables may offset each other.
C. Mis-specification of the mathematical form of the model
If we use mathematical form which differ from the correct form of relation-
ship then the random variables may show the serial correlation.
Example : If we chosen a linear function while the correct form are non-
linear, then the values of U will be correlated.
Cont’d 32
D. Mis-specification of the true random term U.
Many random factors like war, drought, weather condition, strikes etc. exert
influence that are spread over more than one period of time.
Example: the effect of weather condition in agricultural sector will influence
the performance of all other economic variables in several times in the
future.
A strike in an organization affect its future production process which will
persist for several future periods.
In such cases the value’s of U become serially dependent, so that
autocorrelation has happen.
2. Effect of Autocorrelation on OLS Estimators 33
We have seen that ordinary least square technique is based on basic
assumptions.
Some of the basic assumptions are with respect to mean, variance and
covariance of disturbance term.
Naturally, therefore, if these assumptions do not hold good on what so ever
account, the estimators derived by OLS procedure may not be efficient.
Following are effects on the estimators if OLS method is applied in presence of
autocorrelation in the given data.
Cont’d 34
1. The OLS estimator is unbiased but not BLUE
2.The variance of OLS estimates is inefficient: The variance of estimate 𝛽 in
simple regression model will be biased down wards (i.e. underestimated) when
random terms are auto correlated.
3.Wrong Testing Procedure: If var(𝛽 ) is underestimated, SE(𝛽 ) is also
underestimated, this makes t-ratio large. This large t-ratio may make 𝜷
statistically significant while it may not.
4.Wrong testing procedure will make wrong prediction and inference about the
characteristics of the population.
3. Detection of Autocorrelation 35
The following are detection methods of Autocorrelation:
1.Graphic methods
In the disturbance term- Ui, since 𝒆𝒊 = 𝐘𝐢 − 𝒀𝒊 estimates of the true
Ui, and then if 𝒆𝒊 are found to be correlated it will suggest that Ui are
auto correlated with each others.
In order to test for autocorrelation, it is necessary to investigate
whether any relationships exist between the current value of 𝑢, 𝑢𝑡 , and
any of its previous values, 𝑢𝑡−1 , 𝑢𝑡−2 …
Cont’d
36
Plot the residuals against their own lag:
It shows the relationships b/n the current 𝑢, &
the immediately previous one, 𝑢𝑡−1 .
Plot 𝑢𝑡 horizontally and 𝑢𝑡−1 vertically. i.e. plot the Fig 4.5. Plot of 𝒖𝒕 against 𝒖𝒕−𝟏 showing
following observations. positive autocorrelation
If most of the points fall in quadrant I and III,
we say that the given data is autocorrelated & the
autocorrelation is positive autocorrelation(Fig 4.5).
If most of the points fall in quadrant II and IV,
the autocorrelation is said to be negative.(Fig 4.6).
But if the points are scattered equally in all the
quadrants, its said to be no autocorrelation in the Fig 4.6. Plot of 𝒖𝒕 against
given data (fig. 4.9). 𝒖𝒕−𝟏 showing negative
autocorrelation
Cont’d 37
Figure 4.7 Plot of 𝒖𝒕 over time, showing
positive autocorrelation
Plotting 𝑢𝑡 over time:
If series of residuals will not cross the time-
axis very frequently, shows that a positively
autocorrelated. (Fig. 4.7)
A negatively autocorrelated series of residuals
will cross the time-axis more frequently than if
they were distributed randomly. (Fig. 4.8)
If the time series plot of the residuals does
not cross the x-axis either too frequently or
too little. then we say there is no
autocorrelation in the given data. (Fig. 4.10)
Figure 4.8 Plot of 𝒖𝒕 over time, showing
negative autocorrelation
Cont’d 38
Figure 4.10. Plot of 𝒖𝒕 over time,
Figure 4.9 Plot of 𝒖𝒕 against 𝒖𝒕−𝟏 , showing no autocorrelation
showing no autocorrelation
Cont’d 39
Autocorrelation may be positive or negative but in most of the cases of
practice autocorrelation is positive.
The main reason for this is economic variables are moving in the same
direction.
Example:
In period of boom employment, investment, output, growth of GNP, consumption
etc are moving up wards & then the random term Ui will follow the same
pattern.
And again in periods of recession all the economic variables will move down
words & the random term will follow the same patterns.
Cont’d 40
2. Formal testing method
This method is called formal because the testis based on the formal testing
procedure you have seen in your statistics course.
Of course, a first step in testing whether the residual series from an
estimated model are autocorrelated would be to plot the residuals as
above, looking for any patterns.
Graphical methods may be difficult to interpret in practice, however, and
hence a formal statistical test should also be applied.
Different econometricians and statisticians suggest different types of
testing methods.
Cont’d 41
The most frequently and widely used testing methods by researchers are:
i. The Durbin-Watson (DW) Test, Durbin and Watson (1951)
Durbin–Watson (DW) is a test for first order autocorrelation – i.e. it tests only for a
relationship between an error and its immediately previous value.
One way to motivate the test and to interpret the test statistic would be in the
context of a regression of the time t error on its previous value.
Under the null hypothesis, the errors at time t − 1 and t are independent of one
another, and if this null were rejected, it would be concluded that there was evidence
of a relationship between successive residuals.
Cont’d 42
2. The Breusch–Godfrey test
Recall that DW is a test only of whether consecutive errors are related to one another. So, not
only can the DW test not be applied if a certain set of circumstances are not fulfilled, there will
also be many forms of residual autocorrelation that DW cannot detect.
For example, if 𝐶𝑜𝑟𝑟(𝑢𝑡 , 𝑢𝑡−1 ) = 0, but 𝐶𝑜𝑟𝑟(𝑢𝑖 , 𝑢𝑡−2 ) ≠ 0, DW as defined above will not find any
autocorrelation. One possible solution would be to replace 𝑢𝑡−1 in with 𝑢𝑡−2 .
However, pairwise examinations of the correlations (𝑢𝑖 , 𝑢𝑡−1 ), 𝑢𝑖 , 𝑢𝑡−2 , 𝑢𝑖 , 𝑢𝑡−3 , …will be tedious
in practice and is not coded in econometrics software packages.
Consequently, the critical values should also be modified somewhat in these cases. Therefore, it is
desirable to examine a joint test for autocorrelation that will allow examination of the
relationship between 𝒖𝒕 and several of its lagged values at the same time.
The Breusch–Godfrey test is a more general test for autocorrelation.
4. Remedies for Autocorrelation 43
The remedies to remove the effect of auto correlation depends on the
source of autocorrelation.
1. Include these omitted explanatory variables.
2. Apply the appropriate mathematical form of the model.
3. If tests prove that there is a true autocorrelation problem, we may
use other techniques such as GLS.
44
Ganamo Fitala
[email protected]
END OF CHAPTER FOUR
THANK YOU!