Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views10 pages

Econometrics Project

Took GDP of Pakistan data from Kaggle, apply mulltiple linear regression and then check assumptions using python

Uploaded by

tahreemasif18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views10 pages

Econometrics Project

Took GDP of Pakistan data from Kaggle, apply mulltiple linear regression and then check assumptions using python

Uploaded by

tahreemasif18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Econometrics Project

Submitted by: Menahil

Roll no. 39-20

BS Statistics Regular 2020-2024

Submitted to: Miss Wajeeha Batool

COLLEGE OF STATISTICAL SCIENCES,

PUNJAB UNIVERSITY LAHORE


1 Data Description:

Annual time series secondary data set contains the GDP of Pakistan from the year 2000 to 2021,
broken down into various sectors such as Agriculture, Industrial, Services, and other
components. These sectors play an essential role in the Pakistani economy. The GDP is
dependent variable. The values of ‘GDP’ are in US dollars (Billions). The ‘per capita is an
annual growth in rupee relative to the U.S. dollar and the values of ‘growth rate is in percentage.
The sectors that contribute to GDP of Pakistan are as follows:

Variables Units Sources


Crops % State bank of Pakistan
Livestock % State bank of Pakistan
Forestry % State bank of Pakistan
Fishing % State bank of Pakistan
Total Agricultural sectors % State bank of Pakistan
Mining and Quarrying % State bank of Pakistan
Manufacturing % State bank of Pakistan
Large Scale % State bank of Pakistan
Small Scale % State bank of Pakistan
Slaughtering % State bank of Pakistan
Electricity generation & distribution % State bank of Pakistan
and Gas distribution
Construction % State bank of Pakistan
total Industrial Sectors % State bank of Pakistan
Wholesale & Retail trade % State bank of Pakistan
Transport, Storage & % State bank of Pakistan
Communication
Finance & Insurance % State bank of Pakistan
Housing Services % State bank of Pakistan
General Government Services % State bank of Pakistan
Other Services % State bank of Pakistan
total Services Sector % State bank of Pakistan
Gross Domestic Product $(billion) State bank of Pakistan
Per Capita $ The World Bank
Growth rate % The World Bank
Reference:

This data is sourced from Kaggle.

Hanzlanawaz, H. (n.d.). Contribution of various sectors to Pakistan's GDP. Retrieved from


https://www.kaggle.com/datasets/hanzlanawaz/contribution-of-various-sectors-to-pakistans-gdp

2 Check Normality

2.1 RESULTS OF SHAPIRO-WILK TEST:


Shapiro-Wilk Statistic: 0.3453448414802551
p-value: 1.299554386725585e-39
The data does not appear to be normally distributed.

2.2 RESULTS OF BOX-COX TEST:


Shapiro-Wilk Statistic: 0.9175485968589783
p-value: 0.06764359027147293
The data appears to be normally distributed.

3 Results of fitting regression model on the data:

3.1 VALUES OF COEFFICIENTS

Variables Coefficients
Crops -8.25
Livestock -27.03
Forestry -57.14
Fishing 8.46
Total Agricultural sectors 84.48
Mining and Quarrying 51.27
Manufacturing -20.42
Large Scale 67.09
Small Scale -80.03
Slaughtering 181.98
Electricity generation & distribution and Gas distribution -57.08
Construction -3.67
total Industrial Sectors -0.98
Wholesale & Retail trade 29.28
Transport, Storage & Communication -7.64
Finance & Insurance -21.60
Housing Services 1.41
General Government Services -10.22
Other Services 0.73
total Services Sector -28.04
Gross Domestic Product 25.71
Per Capita 0.21
Growth rate -0.93
Interpretation:

The coefficients represent the change in the dependent variable for a one-unit change in the
independent variable, while holding all other independent variables constant.
Mean Squared Error 913.77
R-squared 0.85
Interpretation:

The MSE of 913.77 suggests that the model is not very accurate in its predictions. The R-squared
value of 0.85 indicates that approximately 85% of the variation in the dependent variable can be
explained by the independent variables in the model.
4 Residuals to check Heteroscedasticity

4.1 GRAPHICAL REPRESENTATION TO CHECK HETEROSCEDASTICITY

Interpretation:
From the provided plot, the residuals appear to be randomly scattered around the horizontal axis,
suggesting that there is no clear evidence of heteroscedasticity. The variance of the residuals
seems fairly constant across the different levels of fitted values.

4.2 RESULT OF BREUSCH-PAGAN TEST

'LM Statistic': 22.0


'LM-Test p-value': 0.5202517804007958
Interpretation:
This p-value is significantly higher than the typical significance level of 0.05.
A high p-value indicates that we fail to reject the null hypothesis, which states that there is
homoscedasticity (constant variance of residuals).
Therefore, this result suggests that there is no evidence of heteroscedasticity in this model.
4.3 RESULTS OF WHITE'S TEST

'Test Statistic': 22.0


'Test Statistic p-value': 0.39950988556124917
Interpretation:
The White's test result, with a Test Statistic of 22.0 and a p-value of 0.3995, indicates that there
is no significant evidence of heteroscedasticity in your regression model's residuals. This high p-
value suggests that the residuals have constant variance, supporting the assumption of
homoscedasticity.

5 Results of fitting MLR model

5.1 OLS REGRESSION RESULTS

OLS Regression Results


==============================================================================
Dep. Variable: GDP R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Wed, 19 Jun 2024 Prob (F-statistic): nan
Time: 14:31:03 Log-Likelihood: 457.67
No. Observations: 22 AIC: -871.3
Df Residuals: 0 BIC: -847.3
Df Model: 21
Covariance Type: nonrobust

1. Log-Likelihood: 457.67

The log-likelihood value of 457.67 suggests that the linear regression model is able to fit the data
reasonably well. The log-likelihood is a measure of how well the model fits the data, with higher
values indicating a better fit.

2. AIC: -871.3

The Akaike Information Criterion (AIC) value of -871.3 is a measure of the model's goodness of
fit, taking into account the number of parameters in the model. A lower AIC value generally
indicates a better-fitting model.

3. BIC: -847.3

The Bayesian Information Criterion (BIC) value of -847.3 is another measure of the model's
goodness of fit, with a lower value indicating a better-fitting model.
Interpretation

The combination of the high log-likelihood, small sample size, and the unusual degrees of freedom for the
residuals suggests that the multiple linear regression model be the most appropriate choice for this
dataset.

5.2 CHECK ASSUMPTIONS OF MLR

5.2.1 Linearity

5.2.2 Normality of residuals


Shapiro-Wilk Statistic: 0.95
p-value: 0.36
The residuals appear to be normally distributed.

5.2.3 Heteroscedasticity
Levene Statistic: 59.94
p-value: 0.00
The residuals do not have constant variance (heteroscedasticity).
5.2.4 Multicollinearity
Feature VIF
Crops 3.115103e+09

Livestock 1.325875e+08

Forestry 2.608219e+08

Fishing 7.858197e+05

Total Agricultural sectors 4.351817e+05

Mining and Quarrying 1.203756e+09

Manufacturing 2.189421e+07

Large Scale 5.469523e+08

Small Scale 1.031750e+09

Slaughtering 2.039360e+07

Electricity generation & 1.220299e+07

distribution and Gas


distribution
Construction 6.906859e+06

total Industrial Sectors 1.183032e+07

Wholesale & Retail trade 5.821126e+08

Transport, Storage & 3.408510e+07

Communication
Finance & Insurance 1.822213e+07

Housing Services 1.342569e+06

General Government Services 3.625816e+06

Other Services 3.870344e+06

total Services Sector 1.002342e+07

Per Capita 1.286946e+09

Growth rate 9.746339e+02


5.2.5 Independence of errors

Interpretation:

The residuals plot shows a clear pattern, indicating potential autocorrelation and suggesting that
the errors are not independent. This violates the assumption of independence of errors in MLR as
the residuals should ideally be randomly scattered around the horizontal axis without any
discernible pattern.

6 Result of goodness of fit test:


Chi-square goodness of fit test is used:
Chi-square statistic 9.818181818181817

P-value 0.3654040928300495
Fail to reject the null hypothesis: The observed distribution is not significantly different from the
expected distribution.

Model Fit: Because we fail to reject the null hypothesis, we conclude that the observed data does not
significantly deviate from the expected distribution. Therefore, the model is a good fit for the observed
data.

You might also like