A Gretl Guide for Preparing Econometrics II Course Project
1.Installation of Gretl
In order to download and install Gretl, go to the following website http://gretl.sourceforge.net/ and install
the program by running the appropriate exe file for your operating system.
It will take about a few seconds before a new window open after clicking. Locate and save file. Once you
open the file, click run and follow the on-screen instructions.
2. Preparing your data for the estimation
Before the estimation, we need to specify our model. Since we are planning to estimate an import demand
function for Turkey, we need to specify our model as follows:
𝑙𝑙𝑙𝑙𝐼𝐼𝐼𝐼𝐼𝐼𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑙𝑙𝑙𝑙𝐺𝐺𝐺𝐺𝐺𝐺𝑡𝑡 + 𝛽𝛽3 𝑙𝑙𝑙𝑙𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑡𝑡 + 𝑢𝑢𝑡𝑡
The variables above are defined in terms of the natural log to interpret slope parameters in terms of
elasticities. 𝑙𝑙𝑙𝑙𝐼𝐼𝐼𝐼𝐼𝐼𝑡𝑡 stands for natural log of the real import of goods and services. 𝑙𝑙𝑙𝑙𝐺𝐺𝐺𝐺𝐺𝐺𝑡𝑡 and 𝑙𝑙𝑙𝑙𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑡𝑡
represent natural logs of Real Gross Domestic Product and real exchange rate, respectively. 𝑢𝑢𝑡𝑡 is an error
term initially assumed to satisfy the assumptions of CLRM. The data for GDP and import are collected
from the WDI online database of World Bank available at https://databank.worldbank.org/source/world-
development-indicators , while the data for real exchange rate data is obtained from OECD main economic
indicators database (https://www.oecd-ilibrary.org/economics/data/main-economic-indicators/main-
economic-indicators-complete-database_data-00052-en). Our estimation sample covers the period from
1970 to 2018 including 49 number of observations.
After the introduction, you need to give a brief information on the expected sign of the coefficients based
on the theoretical structure of the model. For example, in our case, we are expecting a positive sign for the
coefficient of l_GDP 𝛽𝛽2 , likewise, we are also expecting a positive sign for the coefficient of the real
exchange rate since an increase in REER indicating an appreciation of domestic currency against the
currencies of the most traded countries may lead to an increase in import demand.
2.How to import data into Gretl
There are many ways to import data into Gretl. The first way is to use the main menu bar by clicking file-
> open data and user file and select folder where the excel data is located in.
The second and the most straightforward option is drag and drop the excel file directly on the blank Gretl
workfile.
When you drag our data (gretl_data_2020_imp_func.xlsx) into Gretl you should have the following screen
Then select the first sheet named gretl_data including the organized version of the data. As you see below,
the variables are successfully imported into Gretl.
After getting data into Gretl, we need to take the natural log of the variables by first selecting all the
variables and clicking on add->logs of selected variables
After this, three new variables will be appeared in your workfile.
After this we can estimate our regression model using OLS by clicking on model ->Ordinary Least Squares
Then you need to specify the dependent variable and independent variables using the following dialog box
and press OK to see the estimation results
After getting the results, we can first interpret the sign and significance of the coefficients. For example, in
our import demand function, we found a positive and significant coefficient for the log of GDP and the log
of the real exchange rate. The coefficient of GDP is positive and significant as expected, given the estimated
parameter of 1.569 greater than one, we can also say that Turkey’s import is highly elastic with respect to
income. The coefficient of real exchange rate is also significant but has an opposite sign in contrast with
theoretical expectations. Our model is very straightforward; therefore, it does not have additional variables
or power to explain the underlying reasons for this evidence, but we have to keep in mind that the Turkish
economy is highly reliant on the import of intermediate goods to produce final goods and services.
3. Application of diagnostic tests
The diagnostic tests should be applied in your project based on the following order.
3.1 Jarque-Bera Normality test
The first test, known as Jarque-Bera is used to test the normality of the residuals obtained from OLS
estimation. This test first computes the skewness and kurtosis, measures of the OLS residuals and uses the
following test statistic:
𝐽𝐽𝐽𝐽 = 𝑛𝑛[𝑆𝑆 2 / 6 + (𝐾𝐾 − 3)2 / 24]
where n = sample size, S = skewness coefficient, and K = kurtosis coefficient. For a normally distributed
variable, S = 0 and K = 3. In that case the value of the JB statistic is expected to be 0. The JB statistic
follows the chi-square distribution with 2 df. The critical value for 5% level is 5.99.
Ho: Residuals are normally distributed
H1: Residuals are not normally distributed
To apply this test in Gretl first we need to estimate the equation again and on the regression output dialog,
click on tests-> normality of residuals as below
Then the output including the results of the test should be appeared as follows:
Frequency distribution for uhat1, obs 1-49
number of bins = 7, mean = 4.71278e-015, sd = 0.173387
interval midpt frequency rel. cum.
< -0.48332 -0.55637 1 2.04% 2.04%
-0.48332 - -0.33723 -0.41027 0 0.00% 2.04%
-0.33723 - -0.19114 -0.26418 5 10.20% 12.24% ***
-0.19114 - -0.045042 -0.11809 14 28.57% 40.82% **********
-0.045042 - 0.10105 0.028004 13 26.53% 67.35% *********
0.10105 - 0.24714 0.17410 13 26.53% 93.88% *********
>= 0.24714 0.32019 3 6.12% 100.00% **
Test for null hypothesis of normal distribution:
Chi-square(2) = 4.522 with p-value 0.10423
In order to conclude, we need to compare 𝜒𝜒 2 (2) critical value for a given significance level, i.e. 5.99 at 5
percent, with the calculated test statistics highlighted above in yellow. Since 𝜒𝜒 2 (2)crit>4.522 we can
conclude that the residuals of our regression model are normally distributed.
3.2 Multicollinearity test
Multicollinearity problem arises when a high or a “perfect,” or exact, linear relationship among some or all
explanatory variables of a regression model.
The name of the test we are using in the multicollinearity is VIF (variance inflating factor) test, and it is
calculated based on the following formula:
𝑉𝑉𝑉𝑉𝑉𝑉 = 1/(1 − 𝑅𝑅𝑗𝑗2 )
Where R(j) is the multiple correlation coefficient between variable j and explanatory variables. VIF values
greater than 10 can be taken as evidence for multicollinearity.
The null and alternative hypothesis are specified as follows:
Ho: There is no multicollinearity
H1: There exists multicollinearity
To apply the multicollinearity test, we need to estimate the equation by OLS as before and then go to
analysis-> collinearity on the regression output dialog. The results for the VIF test for the import function
are presented below
Variance Inflation Factors
Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
l_gdp 1.116
l_reer 1.116
As it seen the calculated VIF is 1.116, this indicates there is no severe multicollinearity problem in our
import demand function.
3.3 Heteroscedasticity tests
3.3.1 Goldfeld-Quant test
As for the detection of heteroscedasticity Goldfeld-Quant test is first considered. To this aim, we need to
install an addon for this test. You need to connect to the internet to download this package available on the
server. Click on File from the main menu and find File->Function packages->On server.
When you click on server, find gqtest package and click on install (diskette icon)
You should see the following warning, the test will be available in the tests menu if you click on yes.
As for the theoretical structure of the Goldfeld -Quandt test, the test is based on the assumption that
heteroscedasticity (Unequal Variance) is a function of one of the independent variables we can assume for
example variance of error term of the import equation varies with the l_gdp. Therefore the all the variables
in the model will be ordered according to increasing values of the log of GDP. Then we need to specify the
number of central observations to be excluded from the centre of the ordered data.
𝑅𝑅𝑅𝑅𝑅𝑅2
𝑑𝑑𝑑𝑑1
Then the 𝜆𝜆 test based on comparison of residual sum of squares is calculated as follows 𝜆𝜆 = 𝑅𝑅𝑅𝑅𝑅𝑅1
𝑑𝑑𝑑𝑑2
The hypothesis of test used to make decision is given by,
Ho: There is no heteroscedasticity (residuals are homoscedastic)
H1: There exists heteroscedasticity (residuals are not homoscedastic)
To apply this test to our import equation we need to go back to main regression output and from the tests
find the name of the specified test as below and click on it
After selecting the test new dialog box is appeared. We need to select the variable used for the ordering of
the data. To do this, click on (+) next to the “variable to sort (by)” and select I_gdp. You need to also
specify the percentage of the trimmed data from the centre using the option “fraction of middle obs. Omit
(scalar)” here 0.1 (or you can write 5/49) is chosen this indicates that 10 percent of the data (5 observations
in our case) is omitted from the center.
The test output presented below
Goldfeld - Quandt heteroskedasticity test.
5 observations with the smallest values (s1^2)
and 5 with the largest values of 'l_gdp' were used (s2^2).
Test statistic: s2^2/s1^2 = F(2,2) = 0.400557
H1(s1^2 != s2^2): pvalue = 0.5720
H1(s1^2 < s2^2): pvalue = 0.7140
H1(s1^2 > s2^2): pvalue = 0.2860
The calculated F is 0.400557 is lower than the Fcrit=F(19, 19)=2.168 therefore one can not reject the null
hypothesis of homoscedasticity.
3.3.2 White heteroscedasticity test
As explained in class notes, the major drawback of the Goldfeld-Quant test is the assumption that variance
of error term is a function of one of the independent (explanatory) variables. This assumption is very
counterintuitive since it requires to know the variable leading to change of the variance of error term by
observation to observation. To overcome this unrealistic assumption, White proposed a heteroscedasticity
test including the following steps :
1. First the following model is estimated by OLS and residuals 𝑢𝑢�𝑡𝑡 are obtained
𝑙𝑙𝑙𝑙𝐼𝐼𝐼𝐼𝐼𝐼𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑙𝑙𝑙𝑙𝐺𝐺𝐺𝐺𝐺𝐺𝑡𝑡 + 𝛽𝛽3 𝑙𝑙𝑙𝑙𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑡𝑡 + 𝑢𝑢𝑡𝑡
2. The following test equation is estimated where 𝑢𝑢�𝑡𝑡2 is dependent variable:
𝑢𝑢�𝑡𝑡2 = 𝛼𝛼0 + 𝛼𝛼1 𝑙𝑙𝑙𝑙𝐺𝐺𝐺𝐺𝐺𝐺𝑡𝑡 + 𝛼𝛼2 𝑙𝑙𝑙𝑙𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑡𝑡 + 𝛼𝛼3 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡𝑡2 + 𝛼𝛼4 𝑙𝑙𝑙𝑙𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑡𝑡2 + 𝛼𝛼5 𝑙𝑙𝑙𝑙𝐺𝐺𝐺𝐺𝐺𝐺𝑡𝑡 × 𝑙𝑙𝑙𝑙𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑡𝑡
and nR2 is calculated.
3. This test has a 𝜒𝜒 2 (5) distribution 5 is the number of explanatory variables in the test equation excluding
intercept term.
To apply this test with Gretl, we need to go to Model 1 results and select tests-> heteroscedasticity->White’s
test
The hypothesis of the test is the same with Goldfeld-Quant. The results presented below show that the null
hypothesis of homoscedasticity has not been rejected by White’s test.
White's test for heteroskedasticity
OLS, using observations 1970-2018 (T = 49)
Dependent variable: uhat^2
coefficient std. error t-ratio p-value
---------------------------------------------------------
const −18.2702 32.0864 −0.5694 0.5720
l_gdp 0.965733 1.82276 0.5298 0.5990
l_reer 2.19476 4.21381 0.5209 0.6051
sq_l_gdp −0.00653997 0.0273848 −0.2388 0.8124
X2_X3 −0.132122 0.108605 −1.217 0.2304
sq_l_reer 0.155827 0.206213 0.7557 0.4540
Unadjusted R-squared = 0.155454
Test statistic: TR^2 = 7.617230,
with p-value = P(Chi-square(5) > 7.617230) = 0.178631
3.3.3 Remedial Measures for Heteroscedasticity
In our example, both Goldfeld-Quant and White tests indicate no heteroscedasticity problem. However,
suppose that those tests favor the presence of heteroscedasticity. In this case, we need to use an estimator
called GLS to fix the standard errors of the regression model. To apply this with gretl click on “Robust
Standard Errors (HAC)” option in the estimation stage to get corrected standard errors with the GLS
estimation method.
The estimation output is presented below
Model 3: OLS, using observations 1970-2018 (T = 49)
Dependent variable: l_imp
HAC standard errors, bandwidth 2 (Bartlett kernel)
Coefficient Std. Error t-ratio p-value
const −18.5483 1.82180 −10.18 <0.0001 ***
l_gdp 1.56915 0.0605893 25.90 <0.0001 ***
l_reer −0.697194 0.169649 −4.110 0.0002 ***
Mean dependent var 20.77888 S.D. dependent var 1.017751
Sum squared resid 1.382902 S.E. of regression 0.173387
R-squared 0.972186 Adjusted R-squared 0.970976
F(2, 46) 345.2804 P-value(F) 1.98e-28
Log-likelihood 17.87910 Akaike criterion −29.75820
Schwarz criterion −24.08273 Hannan-Quinn −27.60493
rho 0.565412 Durbin-Watson 0.835825
As can be seen, coefficient estimates are the same with OLS but standard errors and therefore calculated t
values are different since HAC methodology is used for correction. However, in our case since we do not
have heteroscedasticity, the significance level of the parameters does not change.
3.4 Autocorrelation tests
3.4.1 Durbin-Watson d test
Durbin Watson is one of the most widely used tests for autocorrelation based on the residuals of the
estimated regression model.
The result of DW test has already been reported on the bottom of the regression output highlighted below.
Model 4: OLS, using observations 1970-2018 (T = 49)
Dependent variable: l_imp
Coefficient Std. Error t-ratio p-value
const −18.5483 1.50439 −12.33 <0.0001 ***
l_gdp 1.56915 0.0435850 36.00 <0.0001 ***
l_reer −0.697194 0.136679 −5.101 <0.0001 ***
Mean dependent var 20.77888 S.D. dependent var 1.017751
Sum squared resid 1.382902 S.E. of regression 0.173387
R-squared 0.972186 Adjusted R-squared 0.970976
F(2, 46) 803.9142 P-value(F) 1.65e-36
Log-likelihood 17.87910 Akaike criterion −29.75820
Schwarz criterion −24.08273 Hannan-Quinn −27.60493
rho 0.565412 Durbin-Watson 0.835825
5% critical values for Durbin-Watson statistic, n = 49, k = 2
dL = 1.4564
dU = 1.6257
Ho: There is no first-order autocorrelation
H1: There exists first-order (positive or negative) autocorrelation
Since our d value (0.835825) is less than the lower bound of dL=1.4564 we can conclude that we have first-
order autocorrelation.
3.4.2 Breusch-Godfrey test
DW test has been criticized for having too many restrictive assumptions, normality of residuals, regression
model should have an intercept term, the model should not have the lagged values of the dependent variable
on the right-hand side, higher order autocorrelation is not allowed etc. Breusch-Godfrey proposed an
alternative autocorrelation test to detect higher-order autocorrelation. The tests can be applied by clicking
autocorrelation from the tests menu, as shown below. The lag order of the test can be set higher than 1 to
allow for higher-order autocorrelation.
The results presented below provide supportive evidence for DW test since the coefficient of the one period
lagged residual uhat_1 is significant at one percent level. (n-p)R2 value of the test equation is higher than
the critical value; therefore, we conclude that the null hypothesis of no higher-order autocorrelation can
also be rejected based on the Breusch-Godfrey test.
Breusch-Godfrey test for autocorrelation up to order 2
OLS, using observations 1970-2018 (T = 49)
Dependent variable: uhat
coefficient std. error t-ratio p-value
--------------------------------------------------------
const 0.319615 1.26733 0.2522 0.8021
l_gdp −0.00836721 0.0367227 −0.2278 0.8208
l_reer −0.0206661 0.114101 −0.1811 0.8571
uhat_1 0.687575 0.148984 4.615 3.40e-05 ***
uhat_2 −0.227542 0.152882 −1.488 0.1438
Unadjusted R-squared = 0.338068
Test statistic: LMF = 11.236027,
with p-value = P(F(2,44) > 11.236) = 0.000114
Alternative statistic: TR^2 = 16.565317,
with p-value = P(Chi-square(2) > 16.5653) = 0.000253
Ljung-Box Q' = 15.6238,
with p-value = P(Chi-square(2) > 15.6238) = 0.000405
3.4.3 Remedial Measures for Autocorrelation
We can suggest Cochrane-Orcutt iterative procedure to fix the problem of autocorrelation. However it is
not available in the latest release of Gretl; hence if you select directly HAC similar to remedying
heteroscedasticity, this may also cure the autocorrelation problem.
3.5 Ramsey’s Reset Test
The regression error specification test proposed by Ramsey can be applied by clicking on tests -> Ramsey’s
RESET as depicted in the screenshot below.
If you have adequate number of observations select the first option squares and cubes and press OK.
The null and alternative hypotheses of the test are written by
Ho: There is no misspecification error in the model
H1: There exists misspecification error in the model
The test is based on the following F test
The test output below indicates that we have a severe misspecification error in our model as the calculated F
(5.823795) is higher than the critical F value (3.20928). Therefore we need to think of changing specification of
our model by adding relevant variables or adding dummy variables to account for the impacts of crises or
structural changes in the trade regime.
Auxiliary regression for RESET specification test
OLS, using observations 1970-2018 (T = 49)
Dependent variable: l_imp
coefficient std. error t-ratio p-value
---------------------------------------------------------
const 69.2618 854.814 0.08103 0.9358
l_gdp −4.70645 52.5878 −0.08950 0.9291
l_reer 2.34125 23.3140 0.1004 0.9205
yhat^2 0.281701 1.60502 0.1755 0.8615
yhat^3 −0.00593128 0.0255921 −0.2318 0.8178
Warning: data matrix close to singularity!
Test statistic: F = 5.823795,
with p-value = P(F(2,44) > 5.8238) = 0.0057
4. The structure of the project
Your project should have the following structure:
1. Formulation of the Model
2. Data Sources and Description
3. Model Estimation and Hypothesis Testing
4. Interpretation of the Results
5. Conclusion
References
Appendices
A.The Data Set
B. Computer Outputs
In Section 1, formulate the model
Yt=β0 +β1X1t + β2X2t+…+ βkXkt + εt for t=1, 2, …, T
You should specify what the variables represent. For example, if you estimate the consumption function, then Yt and Xt
represent the private consumption expenditure and disposable income respectively. You can state your a priori expectations
related to the signs of the coefficients. Your model should include at least two independent variables.
In Section 2, you can state the source of data, e.g. International Financial Statistics, Turkish Statistical Institute. You can also
describe the properties of the data, showing the mean, standard deviation, correlation of the variables with dependent
variable and by using graphs of the series. Number of observations should be at least 30.
_
In Section 3, firstly you have to estimate the model through OLS and examine F and t-statistics and R 2 . You have to conduct
the diagnostic tests. You may suggest some remedial measure if you find that some of the assumptions of classical linear
regression model are violated.
In Section 4, you can interpret your estimation results. If there is a difference from the implication of the theory, try to
explain why.
In Section 5, you can make some concluding remarks about your findings in your study.
After the references in the Appendix part don’t forget to include the data set and the output of the econometric program
you have used.