Ees 400 - Topic Three - Simple Regression
Ees 400 - Topic Three - Simple Regression
3.1 INTRODUCTION
To illustrate, consider the scatter graph below which shows the relationship between two
variables x and y and a line fitted among the scatter points.
y . +u
. -u
. . .
. x
From the scatter graph, we notice that despite variability between X and Y, there is a general
“tendency” for the variables to move together- i.e., as X increases, so does Y; as shown by
the line of best fit. The fitted linen among the scatter points is actually the REGRESSION
LINE. Thus regression analysis aims at finding the line of best fit among variables. The
1
The relationship Y =α + βX is called a DETERMINISTIC or functional relationship since
it shows us that X and Y have an exact relationship. However, in real life, not many variables
have such an exact relationship. Indeed, not all points lie on the regression line as shown.
Some points are above the line while others are below the line. Only a few are above the line
while others are below the line. Only a few are on the line.
A point to note about regression analysis is that any regression analysis is that any
regression must be guided a prior by economic theory.
Thus, although the function Y =α + βX +u assumes causation, i.e. that X and Y but
this causation must be informed by economic theory. Thus, if we say:
2
3
3.2 CORRELATION VERSUS REGRESSION
CORRELATION REGRESSION
i) Assumes symmetry between X i) Assumes asymmetry between X
and Y i.e., there is no and Y; i.e., distinguishes which
distinction as to which variable variable is dependent and which
is dependent(causality is not is explanatory (causality is
important) important)
Thus, correlation does not imply causality, but regression does so.
In simple regression analysis, we study the effect of only one explanatory variable on the
In multiple variable analyses, we study the effect of more than one explanatory variable
on the dependent variable. For example, how X 1, X2 and X3 will affect Y in:
Y =α + βX 1 + β 2 X 2 + β 3 X 3 +u . Thus Y=f(X , X … X )
1 2 n .
4
3.3 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS
Economic analysis is mainly EMPIRICAL- i.e. we pay a lot of emphasis on using data to do
any econometric analysis. There are basically 3 types of data we can use in economic analysis
A time series is a set of observations on the value that a variable takes at different times; e.g.
the GDP of Kenya from 1963 to 2013.
Time series data are the most widely used data in economic analysis. If the data used in
regression analysis is a time-series, then the regression equation is expressed with subscript t
for time, as follows:
Y t =α + βX t +ut
Cross section data are data of one or more variables collected at the same point in time e.g.
the GDP in Kenya, Uganda, and Tanzania for the year 2013.
If the data in regression analysis is cross-section data, then the regression equation is
5
iii. Panel data
Panel data is data in which the SAME cross-sectional unit is surveyed over time. For
example, the GDP of Kenya, Uganda and Tanzania between the years 1963-2013 will
constitute panel data.
Thus, panel data combines the characteristics of both times series data and cross-sectional
data. For this reason, panel data is usually more powerful than time series or cross sectional
data as it helps to bring out the dynamics of individual behavior.
If the data used in regression analysis is panel data, then regression equation is expressed
with subscript it as follows:
Y i = α + βX i +u it
t t
Where:
( i ) = stands for the cross-sectional unit, Kenya =1, Uganda=2 and Tanzania=3; and
( t ) =stands for the time series identifier, e.g. 1963, 1964, 1965 ... 2013.
6
3.4 TWO-VARIABLE REGRESSION ANALYSIS
Two variable regression analysis aims at fitting the line of best fit between two variable, X
and Y.
Where:
-
Y i is the actual value of dependent / endogenous variable ( for the population)
¿
β 0 is the intercept term of the population regression line.
¿ ¿
β 0 Is the intercept term of the sample regression line. The term β 0 is read as β
β 1 is the slope or partial derivative of the population regression line i.e.
∂Yi
=β1
∂ Xi
¿ ¿
β 1 Is the slope coefficient or partial derivative of the sample regression line. β 1
is an estimator or approximation of
β1
ui Is the error or disturbance term of the population regression.
¿
7
POPULATION
y . +u Y i =β 0 + β 1 X i +u i
. -u
. . .
. x
SAMPLE
¿ ¿ ¿ ¿
y . +u Y = β0 + β 1 X i +u
. -u
. . .
. x
¿
Y
Thus, the error or disturbance term is the difference between each observation ( i orY ) from
¿
the estimated regression line (Y ).
¿
Y −Y = population error/ disturbance term
Thus; i
¿ ¿ Y=
¿
∑ Y =E (Y )
Y −Y = Sample error/ disturbance term. Recall n
¿
Notice that the mean of Y =Y = Expected value of Y=regression line.
On the other hand, the residual is the difference between the population regression
¿
8
The major difference between the error or disturbance term and the residual, is that, whereas
the error term is measurable we cannot measure the residual.
¿ ¿
In regression analysis, we however prefer to use a sample rather than the population. This is
because, in real life or practice, it is easier to observe a sample as compared to observing a
population.
The ordinary squares (OLS) estimators are the main techniques used to estimate regression
models.
The name OLS is derived from the fact that OLS aims at minimizing the sum of squared
¿ ¿
β β
residuals. In so doing, OLS finds the values of the model parameters ( 0 and 1 ) which fits
a line of best fit.
¿ ¿
β β
The ordinary least squares estimators ( 0 and 1 ) are derived using the following eight
steps:
STEP 1: Begin with a sample and population regression lines and obtain the residual, as a
difference of the two:
¿ ¿
e 2 =( Y i − β 0 −β1 X 1 )2
i
9
∑ ei 2=(Y i −β 0−β1 X 1)
¿ ¿ 2
∂∑ e 2
( )
¿ ¿
=2 ∑ Y i −β 0 −β 1 X 1 .−1
i
¿
∂ β0
( )
¿ ¿
−2 ∑ Y i− β0 −β 1 X 1
¿ ¿ ¿
−2 ∑ Y i +2 ∑ β 0 +2 ∑ β 1 X 1 ,,,,but ,2 ∑ β 0
¿ ¿
−2 ∑ Y i +2n β 0 +2 β 1 ∑ X 1 ..............................(1)
∂∑ e 2
( )
¿ ¿
=2 ∑ Y i −β 0 −β 1 X 1 .− X 1
i
¿
∂ β1
¿ ¿
−2 ∑ X 1 . ∑ (Y i −β 0 −β 1 X 1 )
¿ ¿
−2 ∑ Y i X i +2 β 0 ∑ X i +2 β1 ∑ X 2 . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. .(2)
i
STEP 4: The first –order necessary condition for maxima/minima requires that each partial
derivative is equal to zero. Thus we shall equate equation 1 and equation 2 to zero:
From Equation 1:
¿ ¿
−2 ∑ Y i +2n β 0 +2 β 1 ∑ X i =0
¿ ¿
2n β 0 +2 β1 ∑ X i=2 ∑ Y i
¿ ¿
n β 0+β 1 ∑ X i=∑ Y i ...............................................(3)
From Equation 2:
10
¿ ¿
−2 ∑ Y i X i +2 β 0 ∑ X i +2 β1 ∑ X 2 =0
i
¿ ¿
2 β 0 ∑ X i +2 β 1 ∑ X 2=2 ∑ Y i X i
i
¿ ¿
β 0 ∑ X i +β 1 ∑ X 2 =∑ Y i X i . .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. . .(4 )
i
The two resulting equation 3 and 4 are the famous NORMAL EQUATIONS
STEP 5: Check that the normal equations will maximize or minimize the residual sum of
squares
∑ e i2
To do so, we use the second –order conditions on equations 1 and 2 as follows.
∂2∑ e2
i
¿ =2 n> 0( min ima )
∂β 2
0
∂2 ∑ e 2
=2 ∑ X 2 >( min ima)
i
¿
i
∂ β1
Since the second-order conditions are positive definite, it means that and will indeed
MINIMIZE the residual sum of squares.
STEP 6: Express the two normal equations from equation 3 and equation 4 into matrix
format as follows,
¿ ¿
n β 0 + β 1 ∑ X i= ∑ Y i
[n ∑ X i ¿] ¿ ¿ ¿
In matrix form ¿
¿ ¿
β 0 ∑ X i + β 1 ∑ X 2 =∑ Y i X i
i
¿
β 1=|n ∑ i |¿ ¿¿¿¿
Y ¿
¿
11
¿
Recall example 1 (on the sales and profit of ABCD Company limited) provided under
Correction Analysis:
The data table is reproduced as under for convenience: we got the following:
¿
β 1=
∑ xy
∑ x2
¿
1390
β 1=
¿
63 , 400−49 ,500 13 , 900 8250
β 1= = =0 .1685 ¿
82 ,500 82 , 500 , Similarly
β 1=0. 1685
¿
¿
Y=
∑ Y = 90 =9 β 0 =9− ( 0 .1685×55 )
¿
10 n β 0 =9−9 .2667=−0 . 2667
¿ ¿ ¿ ¿
−
X =
∑ X =550 =55 ¿
β =Y −β 1 X but
For 0 n 10 β 0 =−0 . 2667
12
¿
Thus the OLS regression Equation is Y =−0 .2667+0 .1685 X
From the OLS regression equation, we can also predict or forecast the value of Y for any
given value of X.
For example, given that X= 150, we can now predict Y as follows:
¿
Y =−0 .2667+0 .1685 X
¿
Y =−0 .2667+0 .1685(150 )
¿
Y =25
Apart from prediction or forecasting ,we can equally calculate the elasticity of profit with
respect to sales using either point elasticity or arc elasticity as follows :
For point elasticity , the elasticity of profit with respect to sales is.
∂p s
e p , s= .
∂s p
Thus, since Pr ofit=−0 . 2667+0 . 1685(sales ) , then at the mean values of profit and
sales ,we shall obtain :
¿
∂p S
e p , s= . ¿
∂s P
55
e p , s =0 .1685× =1. 0297
Thus, 9 ,
e p , s =1. 0297
13
Interpretation;
A 1% increase in the value of sales, will lead to a 1.0297 % increase in profit, ceteris paribus.
Thus, sales and profit are relatively elastic.
e p , s= .
[
∂ p S 1 +S 2
∂ s P 1 + P2 ]
For example, if
S1 =40 and S2 =60 , then:
At
S1 =40 ; profit =−0 . 2667+0 . 1685(40 ) =6.4733
At
S2 =60 ; profit = −0 . 2667+0 . 1685(60 ) =9.8433
Arc elasticity =
e p , s =0 .1685 .
[ 40+60
]
6 . 4733+9 . 8433
=1 .0327
e p , s =1. 0327
¿
e
Finally, we can also obtain Y (i.e. The estimated value of profits), the residual ( i ) and the
squared residuals (
∑ ei 2 ) or RSS as follows:
¿ ¿
Time X Y e2
Y =−0 .2667+0 .1685 X e i=Y −Y i
14
9 90 14 14.8933 -0.8933 0.8069
Actually, the expected value or mean of the residual or error term should be zero. In this
case, it is not exactly zero due to the rounding off.
Thus E(
e i )= Zero (0)
E ( ei ) =
∑ ei =0
i. The expected value or the mean of the error term is Zero n
This concept was introduced in the table provided above
When the variance of the error term is constant, this is the assumption of
HOMOSKEDASTICITY ; otherwise ,if the variance is not constant ,that is a case
of HETEROSKEDASTICITY, which is actually a violation of the OLS assumption
of homoscedasticity. Therefore the error term should be homoscedastic. The
problem of Heteroskedasticity is common in cross-sectional data.
15
The error term is assumed to follow a normal distribution with a mean of zero and a
variance of σ
2
,
ei N (0 , σ2 )
iv. There is a linear relationship between the dependent variable and the
independent variables: Y =α + βX + e
Thus, the relationship between X and Y is linear in the OLS parameters α and β
v. Assumption of no multicollinearity
vi. Assumption of zero correlation between the independent variable and the error
term; i.e., The error term and the independent variable should not be correlated
Cov ( X i , ei )=0
The error term in period ( i ) and the error term in period ( j) should not be correlated.
Thus, there should be no autocorrelation, otherwise known as SERIAL
Cov ( ei , e j )=E [ ( e i )( e j ) ]=0
CORRELATION. For alli≠ j
An outlier is a value that is very large or very small, in relation to the rest of the
other observations
16
VarU
The variance of the error term, that is i2 is given by:
2
∑ u i2
VarU 2 =σ =
i n−2
N/B n−2 is called the degree of freedom (df) such that we minus 2 since the
¿ ¿
regression model we obtained had 2 OLS estimators α and β .
U
i2 Is the sum of squared residuals (RSS) which we found earlier
∑ ei 2=9 . 806
.
¿
2
∑ u i2 9 .806 9. 806
VarU 2 =σ = = = =1 . 22575
Thus i n−2 10−2 8
The standard error of the regression model ( se ) is obtained by taking the square root of
√ ∑ U i2
n−2
N/B. The standard error of the regression model is actually the standard deviation of
the Y values about the mean of Y .Thus ,it is also the standard deviation of Y.
[Recall we are dealing with a sample drawn from target population] This brings into
consideration, the sampling distribution
¿
se ¿
by:
β
√∑ x 2
17
¿
From our example, we notice that σ =1 .10714 ∑ x 2=8 , 250
Thus,
1 . 10714 1. 10714
se =
¿ = =0 . 01219
β √ 8 , 250 90.8295
se =0. 01219
¿
β
¿
se ¿
by:
se =
¿
α
√ ∑ X i . x σ¿
n∑ x
2
2
¿
From our example, we notice thatσ =1 .10714 ,
∑ X i2 =38 ,500 , n=10 ∑ x 2 =8 , 250
Thus,
se = ¿
√ ∑ 38 , 500 ×1 .10714
10×8 , 250
α
se =0. 68313×1. 10714
¿
α
se =0. 75632
¿
α
The t student ratio is an important test statistic that we use in determining whether a
particular variable or parameter is significant or not. This process is actually referred to
as HYPOTHESIS TESTING:
OLS . estimator
t=
s tan dard . error . of . OLS .estimator
Therefore the t-value for the slope coefficient is given as:
18
¿
β 0 . 1685
t =
¿ =
β se 0 . 01219
¿
β
t =13. 8228
¿
β
¿
This is actually, the calculated t-statistics for β
On the other hand, the t-value for the intercept parameter can be obtained in a similar way
as follows.
¿
α −0 . 2667
t = ¿ =
β se 0 . 75632
¿
α
t =−0 .35258
¿
α
Hence, we can now present the complete regression model for ABC Company where we
regressed profit (Y) on sales (X) as follows:
Pr ofit=−0 . 2667+0 . 1685(sales )
Se (0.75632) (0.01219)
2
t-values -0.35258 13.8228 R =0 . 9598
More formally, these results can be presented in a table of regression results as follows:
Profit Coefficient Std Errors t-value
19
Sales 0.1685 0.01219 13.8228
R2 =0 . 9598
2
Although the coefficient of determination or goodness of fit ( R ) was found as
2
i. We cannot compare the r computed from models which have different dependent
variables.
2
Thus, any re-arrangement of the model will yield different values for r
2
ii. The values of r usually tends to increase as the number of independent variables
2
increase in the model. with this, r loses its usefulness since we cannot tell whether
it is measuring the goodness of fit or the number of independent variables
iii. r 2 also cannot discriminate among models, i.e. It cannot tell us which particular
model to choose among 2 or more models.
2
Due to the above limitations ofr , an alternative measure of goodness of fit, known as
¿
2 r2
adjusted r or commonly has been developed to help overcome these limitations of
2
the simpler .
2
The adjusted r is modified or adjusted so as to accommodate the changes in degrees of
freedom that results due to addition or removal of some independent variables in a
regression model.
2
The formula for Adjusted R is:
[ ( n−1
n−k )
( 1−R ) ]
¿
2
R2 = 1−
20
[ ( ) ]
10−1
¿
R2 = 1− ( 1−0 . 9598 )
10−2
[ () ]
¿
9
R2 = 1− . ( 0 . 0402 )
8
¿
R2 =[ 1−0 . 04521 ]
¿
R2 =0 .9548
Interpretation:
Holding all other factors constant, sales (X) explains or accounts for 95.48% of changes in
profit (Y), when adjusted for degrees of freedom.
¿
R 2
Always 2 < R
The Gauss-Markov Theorem states as follows: “in the class of all linear and unbiased
estimators, the OLS estimators are BLU – Best, Linear and Unbiased estimators”.
¿
An OLS estimator such as β is said to be BLUE - i.e. best linear unbiased estimator, if it has
the following properties:
i. LINEAR
¿
The dependent variable Y should be linear in the parameters ( β ) as shown below;
Y =α + βX + ei
21
¿
α =h i Y i=h1 Y 1 +h2 Y 2 +h3 Y 3 +.. . .. .. .
Where
w i and hi are simply weights
ii. UNBIASEDNESS
¿ ¿
The average or expected value of β is denoted by E ( β ) is equal to its true value β .
¿ ¿ ¿
Thus E ( β ) = β or also E ( β ) - β = 0. In such a case, we say β is an unbiased
estimator of β
¿ ¿ ¿
Similarly E (α ) =α , i.e., α is an unbiased estimator ofα . To demonstrate that β is an
¿
∑ ai β 1=∑ a i β 0 +∑ ai β 1 X i +∑ ai e i
¿
β 1 ∑ a i=β 0 ∑ ai +β 1 ∑ a i X i +e i ∑ ai
Assumptions:
∑ ai β 1=E ( β 1) ,∑ a =0 , and ∑ a X =1
¿ ¿
i i i
¿ ¿ ¿ ¿
Thus,
E( β 1 )=0+ β +0 , E( β 1 )=β 1 . Thus β is an unbiased estimator of β 1
22
¿ ¿ ¿ ¿
- We start from the formula that α =Y −β X
¿
Y=
∑Yi ¿
¿ ∑Yi − ¿
α=
n
∑ wi Y i X
or
¿ ∑Yi − ¿
α=
n
∑ wi X Y i
[ ]
¿ ¿
1
α =∑ −wi X Y i
n
1 ¿
−w i X =hi ¿
[ 1
]
¿ ¿
α =∑ −wi X Y i
- Since n and that
Y i =α + βX i +e i , we substitute Y i into
[ 1
]
¿ ¿
¿ α =∑ −wi X [ α + βX i + ei ]
α as follows: n
1
{[ 1 1
] [ ] [ ] }
¿ ¿ ¿ ¿ ¿
α = −w i X i α + −w i X βX i + −wi X wi X
n n n
[ α
] ∑[ ] ∑[ ] βX i 1
¿ ¿ ¿ ¿
α =∑ −αw i X + −βwi X i X i + −wi X e i
n n n
∑ ∑ βX i − βw
α
[ ]∑
¿ ¿ ¿ ¿
1
α= −∑ αw X + ∑ i i i X Xi+ −wi X i ei
n n n
. .. .. . .. .. . .. . ∑ αw i X i=0 ,,,
¿ ∑ βX i − ¿
n
∑ βw i X X i=0 ,,, ∑ ei =0 ,,
23
¿ ¿
α =E (α ) And also, E( α )=α
¿
Thus (α ) is an unbiased estimator of α
[ √∑ ]
¿ 2 ¿
¿ 2
σ σ
Var β =( se )
2
= =
∑ x2
¿
β 2
x
¿
Next, we obtain the variance of another estimator ( β∗¿ ¿) which is obtained
econometric method
¿ ¿ ¿
24
Var ¿ ¿ ¿
¿ ¿
Thus
Var ¿ ¿ ¿
¿
Var ( β ) <Var ¿ ¿
¿
¿
Thus the OLS estimator ( β ) has minimum variance when compared to the variance of
¿ ¿
another estimator ( β∗¿ ¿) obtained from another econometric method. Thus β is an
efficient estimator.
In summary ,the Gauss-Markov Theorem states as follows: “Given the assumptions of
the classical linear regression model, the ordinary least squares (OLS) by goodness
estimators ,in the class of unbiased linear estimators, have minimum variance i.e.
They are BLUE.
25
3.14 GOODNESS OF FIT
By goodness of fit, we mean: “How well does the sample regression line fit the data?” The
2
goodness of fit, otherwise known as coefficient of determination, is denoted byr .
2
The value of r ranges from 0 to 1, i.e. from no goodness of fit to a perfect goodness of fit.
2
Therefore: 0≤r ≤1 .
2
The following steps illustrate derivation of r
¿ ¿
Y =α + β X i + ei
Step1: Begin with an OLS regression model: i
¿ ¿ ¿ ¿
Or
Y i −Y =Y −Y +e i
( Y i −Y ) =∑ ( Y −Y ) +e 2
¿ 2 ¿ ¿ 2
∑ (Y −Y ) =∑ y =
¿ − 2 ¿
2
Explained sum of squares (ESS)
Therefore: TSS=ESS+RSS
26
TSS ESS RSS ESS RSS ESS RSS
= + 1= + or .. =1−
TSS TSS TSS >>>>>>>>>>>> TSS TSS TSS TSS
ESS
Now, the ratio TSS 2
is called the goodness of fit (r ). Therefore:
∑ ei 2
¿
ESS ∑ y
2
2 2 RSS
r = = . . or . . r =1− .=1−
TSS ∑ y 2 TSS ∑ y2
¿
2
2
∑ ei 2 9 . 806
r =1− ..=1−
∑ y2 244
r 2 =0 . 9598
( ∑ xy )
2
2 ( 1390 )2
r = = =0 .9598 . or 95 . 98 %
Or, ∑ x 2 . ∑ y 2 8250×244
¿
Or,
2
r =β 2
( ) ∑ x2
∑ y2
=0 . 16852 ×
8250
244
=0 .9598 . or . 95 .98 %
Confidence interval estimation aims at constructing an interval around the OLS estimators.
2 2
Where:
¿
- β is the estimated OLS estimator for β
tα
- 2 Is the critical t value for a two tailed test at n-k degrees of
freedom.
27
se ¿
( β ) is the standard error of the slope coefficient ( β )
¿
¿
The figure below illustrates the confidence interval for β :-
¿ ¿
β −t α . se ¿ β +t α . se
(β ) (β)
¿
2 β 2
α α
2 1−α 2
In the diagram above, the shaded part is the rejection region, while the un-shaded part is
the acceptance region.
The following table shows the appropriate critical t values at various levels of
significance and at one-tail and two-tail tests:
For example, in order to calculate a 95% confidence interval for β at a two-tailed test is
obtained as follows.
28
¿ se =
β =0.1685; n=10; k=2 ; n-k= (10-2)=8 degrees of freedom ; and ( β ) 0.1219
¿
and
1-α =0.95 ; hence α =5%=0.05
Thus:
¿ ¿
β −t α ,8df . se ≤ β≤β +t α ,8df .se ( ) =1−α
¿ ¿
(β) 2
β
2
¿ ¿
β −t 0.025 ,8df .se ≤β≤β +t0. 025 ,8df .se( ) =95%
¿ ¿
( β) β
By hypothesis testing, we mean: “can our regression results be trusted?” or also, “Do our
regression estimates matter?”
The null hypothesis is the hypothesis of interest. It is usually denoted by Ho. For example, to
¿
test whether the slope coefficient is significant, we state: Ho: β =0.
The alternate hypothesis is the hypothesis that is tested against the hypothesis of interest, i.e.
the null hypothesis. The alternate hypothesis is denoted by H 1 or HA. For example, the
alternate hypothesis to test whether the slope coefficient is significant, we state as follows:
¿
- H1: β ≠0 for the case of a two-tailed test
29
¿ ¿
- H1: β >0 or H1: β <0 for the case of a one-tail test.
Point to note
¿
The hypothesis Ho: β =0.means as follows:
- The slope coefficient is equal to zero, or
- The slope coefficient is not statistically significant, or
- X does not influence Y
¿
The hypothesis H1: β ≠0 means as follows:
- The slope coefficient is different from zero,
- The slope coefficient is statistically significant
- X does influence Y
In hypothesis testing, there are 2 possible types of errors that can be committed, i.e. Type I
error and type II error.
Type I error occurs when we reject the null hypothesis, when in actual sense, it should
not have been rejected; i.e. “killing an innocent man”
Type II error occurs when we do not reject (accept) the null hypothesis when in actual
sense, it should have been rejected, i.e. “letting a guilty man away Scott-free”
The aim of hypothesis testing is to reduce the chances of committing both type I and type II
errors. This is the reason why in hypothesis testing, we specify the level of significance ( α
=1% or 5% or 10% ).
¿
Earlier on, we had constructed a 95% confidence interval for β and obtained the confidence
¿
For the first set of hypothesis, we notice that the value β =0 does not lie within the
confidence interval, i.e. it lies in the REJECTION REGION. Thus, we reject the null
hypothesis or accept the alternative hypothesis.
¿ ¿
In conclusion, it means β is not equal to zero, or we could say, β is statistically different
from zero.
¿
For the second set of hypothesis, we notice that the value β =0 . 16 actually lies within the
confidence interval, i.e. it lies in the ACCEPTANCE REGION. Thus, we accept or (do not
reject) the null hypothesis.
¿ ¿
In conclusion, it means that β is statistically equal to 0.16 or that β is not statistically
different from 0.16.
31
The test of significance (t-test) approach is the most commonly used to test for hypothesis
testing in econometrics. In this approach, which is similar in spirit to the confidence interval
approach, the null and alternate hypotheses are stated respectively as:
¿ ¿
Ho:β=β∗¿¿H A:β≠β∗
¿
Such that: Ho: β =β∗¿ ¿ is the true value of the estimated OLS coefficient and,
¿
H A : β =β∗¿ ¿ Is a hypothesized or guessed value of β
¿
β∗¿
t−calculated=β − ¿
se
(β )
¿
se ¿
( β ) is the standard error of the OLS parameter β
¿
Where:
¿
If β > β *, then t-calculated will be positive
¿
If β < β *, then t-calculated will be negative
Irrespective of the value of t-calculated, we always take its ABSOLUTE VALUE.
Having obtained t-calculated, we then proceed to obtain the critical value for the t-statics, i.e.
t-critical, from the t-tables.
t α , n−k . df
t-critical = 2 , for a two-tailed test
t α , n−k . df
t-critical= 2 , for a one tailed test
The decision Rule for hypothesis testing using the test of significance approach states as
follows:
“If t-calculated is greater than t-critical, reject the Null hypothesis, but if t-calculated is less
than t-critical, do not reject (accept) the null hypothesis.”
32
For example, we can now test for the following hypothesis using the t-test approach;
i.
H A : β ≠0 ii
H A : β ≠0 . 16
¿ se.
( β ) = 0.01219, thus,
¿
α
t α , n−k =2 . 5 %=0 .025
Then t-critical= 2 , where α =5 % , 2 , n=10, k=2,
t
n-k=8df. Thus, t-critical = 0. 025
, 8 df =2. 306
Thus, according to our decision rule, we reject the Null hypothesis but do not reject (accept)
the alternative hypothesis.
¿ ¿
In conclusion, we can therefore say that β is not equal to zero, or we could say, β is
statistically different from zero.
For the second set of hypothesis, we can obtain t-calculated as follows:
¿
β∗¿ 0.1685−0.16
t−calculated=β − = =0.6973 ¿
se 0.01219
(β )
¿
The value for t-critical will remain the same t t-critical=2.306. Upon comparing t-calculated
and t-critical, we notice that: t −calculated <t −critical . Thus, following the decision
33
¿
rule, we do not reject(accept) the null hypothesis. In conclusion, we can therefore say that β
is statistically equal to 0.16.
POINT TO NOTE: The conclusions from the confidence interval approach actually
resemble the conclusions from the test of significance approach and this must always be so.
Indeed, the confidence interval approach is simply a mirror image of the test of significance
approach.
The probability (P) value approach is also an ideal way for testing hypothesis. The P-value
states the smallest level of significance (α ) for which the null hypothesis can be rejected.
The beauty with P-value approach is that most computer soft ware (Excell, SPSS, STATA,
Eviews, SHAZAM, RATS, etc) automatically provide this P-value whenever you run a
regression.
For example, if the software reports a P-Value of 0.07, it means there is a 7% chance that we
can reject the Null hypothesis. Thus, we can reject the Null hypothesis at α =10 % , but we
α =1 % α =5 % α =10 %
P=0.074 ¿ No No Yes
β is significant at 7.4%
P=0.1025 ¿ No No No
β is significant at 10.25%
34
¿
In summary, the smaller the P-Value, the more significant is β .
Analysis of variance (ANOVA) is a study of Total Sum of Squares, (TSS) and its
components, i.e., Explained Sum of Squares (ESS) and Sum of Squared residuals (RSS).
By dividing the sum of squares (SS) by their associated degrees of freedom (df), we get the
mean sum of squares (MSS). The Anova table therefore shows the sum of squares (SS),
degrees of freedom (df), mean sum of squares (MSS) and source of variation.
¿ ¿ ¿
Due to regression k-1
(ESS)
∑y 2
or β 2
∑x 2
ESS β ∑ x
=
2 2
=MSS reg
df k−1
¿ ¿
Due to residuals n-k
(RSS)
∑ u2 RSS ∑ u
=
2
=MSS res
df n−k
The F statistic follows the F distribution with (k-1) degrees of freedom on the numerator and
(n-k) degrees of freedom on the denominator. The F statistic is used to test for overall
significance of the model.
35
If F-calculated <F-critical, the model is not statistically significant.
EXAMPLE
Recall the example o the sales (X) and profit (Y) of ABC Company limited for a period of
10years. The following values were obtained:
¿
∑ x =8 , 250 , ∑ y =244 , β =0 . 1685 , ∑ e
2 2
i
2=9 . 806 ,
and n=10, k=2
Critical F=
F k−1 , n−k , α=F1 ,8 ,5 %=5 . 32
N.B: The F ratio is always a one-tailed test. We notice that calculated-F is greater than
critical-F, i.e. (Fcal>Fcrit)
36