Selecting the best regression equation
Criteria for model selection:
There are several criteria for model selection such as
2 2
i. R , adjusted R Criterion
ii. Akaike Information Criterion (AIC)
iii. Corrected Akaike Information Criterion(AICc)
iv. Schwartz Information Criteria or Bayes Information Criterion(BIC)
v. Mallows Cp Criterion
Criteria used for testing the validity/accuracy of the model:
i. Absolute Mean Error (AME)
ii. Root Mean Square Error (RMSE)
iii. Mean Absolute Percent Error (MAPE)
iv. Theil U Statistic
2 Explained ss 2
R , adjusted R Criterion: We know R = ,0≤R ≤1
2 2
Total ss
The closer it is to 1, the better the regression fit. It measures how close is fitted y to the
observed y, without the same dependent variable we cannot compare R2. Thus we
compute adjusted R2 defined by
n−1
2
R =1− ( 1−R 2)
n−k
It is clear that R2 ≤ R 2. Addition of regressors does not necessarily increase R2. It can also
increase if t is greater than unity for the corresponding regressor. For comparison of R2
here also regressand must be same.
Akaike Information Criterion (AIC): AIC is an important and leading statistics by
which we can determine the order of an autoregressive (AR) model. Mr. Akaike(1973)
developed this statistics. According to his name this statistics is known as Akaike
Information Criterion (AIC). AIC due to Akaike is defined as
AIC=n ln ( RSSn )+ 2 p
Where RSS= Residual sum of square, p= number of parameters in the model and n=
sample size. The models with smaller AIC is preferred.
Corrected Akaike Information Criterion (AICc): Sometimes the AIC does not provide
the efficient order of model selection. Shibata in 1976 shown that AIC criterion is not
consistent too. Thus Hurvich and Tsai (1989) provide a criterion of AIC for bias. The
criterion is defined as
2 ( Ρ+2 ) ( Ρ+ 3 )
AIC c= AIC+
( Ν −Ρ−3 )
Thus AICc is the sum of AIC and an additional non-stochastic penalty term. The model,
which adequately describes the series, has a minimum AICc.
Bayes Information Criteria ( BIC sch): Several modifications of AIC have been
suggested. One popular variation called Bayes Information Criteria ( BIC sch), originally
proposed by Schwartz (1978), is defined as
BIC sch=n ln ( RSSn )+ p ln ( n)
Lower BIC value indicates better model.
Mallows Cp Criterion: To judge the performance of an equation we should consider the
mean square error of the predicted value rather than the variance. The standardized total
mean square error of prediction for the observed data is measured by
n
1
J p= ∑ MSE ( ^y i )
σ 2 i=1
To estimate J p, Mallows (1973) uses the statistics
RSS
C p= + ( 2 p−n )
σ^
2
Where σ^ 2 is an estimate of σ 2. In choosing a model we look for low C p.
Absolute Mean Error (AME):
The mean of the absolute deviation of predicted and observed values is called absolute
mean error and is defined as
1
AME= ∑ n |Y −Y^ |
n 0 i=¿¿ 0 i
Where n 0=¿ number of period being forecast
Y i=¿ observed value and Y^ =¿ predicted value
Root Mean Square Error (RMSE):
The square root of the sum of square of the deviation of the predicted values from the
observed value dividing by their number of observation is known as the root mean square
error. The root mean square error is defined as
RMSE=
√ 1
∑
n0 i=¿ ¿
n0 ( Y i−Y^ )
2
Mean Absolute Percent Error (MAPE):
The mean of the sum of absolute deviation of predicted and observed value dividing
by the observed value is called mean absolute error. For comparison we have multiplied
by 100, which is called mean absolute percent error and which is defined as
1 |Y i −Y^ |
MAPE= ∑
n 0 i=¿¿
n0
Yi
× 100
Theil’s U Statistic: Theil’s U statistic is a relative accuracy measure that compares the
forecasted results with the results of forecasting with minimal historical data. The
formula for calculating Theil’s U statistic:
√ ( )
n−1
Y^ t+1 −Y t +1 2
∑ Yt
t=1
U=
∑( )
n−1 2
Y t +1−Y t
t=1 Yt
where Yt is the actual value of a point for a given time period t, n is the number of data points, and
Y^ t is the forecasted value.
Stepwise regression:
Step-wise regression is one of several computer-based iterative variable-selection
procedures. In statistics, stepwise regression is a method of fitting regression models in
which the choice of predictor/ explanatory variables is carried out by an automatic
procedure. In each step, a variable is considered for addition to or subtraction from the set
of explanatory variables based on some prespecified criterion. Usually, this takes the
form of a forward, backward, or combined sequence of F-tests or t-tests.
The main approaches for stepwise regression are:
Forward selection:
The forward selection procedure starts with an equation containing no
predictor/explanatory variables, only a constant term in the model. The first variable
included in the model is the one which has the highest R-Squared. At each step, select
the predictor variable that increases R-Squared the most. Stop adding variables when
none of the remaining variables are significant. Note that once a variable enters the
model, it cannot be deleted.
Backward elimination:
The backward selection model starts with all predictor variables in the model. At each
step, the variable that is the least significant is removed. This process continues until
no non-significant variables remain. The user sets the significance level at which
variables can be removed from the model.
Stage wise regression:
Forward stage wise regression follows a very simple strategy for constructing a sequence
of sparse regression estimates; it starts with all coefficients equal to zero, and iteratively
updates the coefficient of the variable that achieves the maximal absolute inner product
with the current residual.
Note: When the number of samples n is less than the signal dimension/parameters p then
we say it is sparse regression model.