Name: Hasnat Mubasher
Roll No: 0961-BH-BAF-20
Chapter - 03
------------------------
3.7
A) summary statistics
. summarize salary years_senior years_comp
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
salary | 135 90.61852 62.5722 10 529.9
years_senior | 135 7.8 6.40988 0 37
years_comp | 135 22.99259 11.67437 2 45
. B) simple regression
1)regression of salary on years as senior officer
. regress salary years_senior
Source | SS df MS Number of obs = 135
-------------+---------------------------------- F(1, 133) = 4.13
Model | 15816.0586 1 15816.0586 Prob > F = 0.0440
Residual | 508831.465 133 3825.80049 R-squared = 0.0301
-------------+---------------------------------- Adj R-squared = 0.0229
Total | 524647.524 134 3915.28003 Root MSE = 61.853
------------------------------------------------------------------------------
salary | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
years_senior | 1.694911 .8336022 2.03 0.044 .0460779 3.343743
_cons | 77.39822 8.403364 9.21 0.000 60.77669 94.01974
------------------------------------------------------------------------------
Coefficient for years_senior: 1.694911
Each additional year as a senior officer increases salary by approximately £1,694.91.
Standard Error: 0.8336022
t-value: 2.03
p-value: 0.044
Statistically significant at the 5% level, indicating a meaningful relationship between years
as a senior officer and salary.
R-squared: 0.0301
The model explains 3.01% of the variance in salary.
Constant (_cons): 77.39822
Estimated salary when years_senior is zero is approximately £77,398.22.
2) regreesion of salary with years with company
. reg salary years_comp
Source | SS df MS Number of obs = 135
-------------+---------------------------------- F(1, 133) = 0.30
Model | 1173.8491 1 1173.8491 Prob > F = 0.5859
Residual | 523473.675 133 3935.89229 R-squared = 0.0022
-------------+---------------------------------- Adj R-squared = -0.0053
Total | 524647.524 134 3915.28003 Root MSE = 62.737
------------------------------------------------------------------------------
salary | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
years_comp | .2535246 .4642326 0.55 0.586 -.6647095 1.171759
_cons | 84.78933 11.9619 7.09 0.000 61.12915 108.4495
------------------------------------------------------------------------------
Coefficient for years_comp: 0.2535246
Each additional year with the company increases salary by approximately £253.52.
Standard Error: 0.4642326
t-value: 0.55
p-value: 0.586
Not statistically significant (p-value > 0.05), indicating no meaningful relationship between
years with the company and salary.
R-squared: 0.0022
The model explains only 0.22% of the variance in salary.
Constant (_cons): 84.78933
Estimated salary when years_comp is zero is approximately £84,789.33.
____________________________________________________________________________________
_________________________________________________________________
Chapter: 04
-------------
4.1:
(A)
. reg bwght faminc cigs
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(2, 1385) = 21.27
Model | 17126.2088 2 8563.10442 Prob > F = 0.0000
Residual | 557485.511 1,385 402.516614 R-squared = 0.0298
-------------+---------------------------------- Adj R-squared = 0.0284
Total | 574611.72 1,387 414.283864 Root MSE = 20.063
------------------------------------------------------------------------------
bwght | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
faminc | .0927647 .0291879 3.18 0.002 .0355075 .1500219
cigs | -.4634075 .0915768 -5.06 0.000 -.6430518 -.2837633
_cons | 116.9741 1.048984 111.51 0.000 114.9164 119.0319
------------------------------------------------------------------------------
(B)
. reg bwght faminc
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(1, 1386) = 16.65
Model | 6819.0527 1 6819.0527 Prob > F = 0.0000
Residual | 567792.667 1,386 409.662819 R-squared = 0.0119
-------------+---------------------------------- Adj R-squared = 0.0112
Total | 574611.72 1,387 414.283864 Root MSE = 20.24
------------------------------------------------------------------------------
bwght | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
faminc | .1183234 .0290016 4.08 0.000 .0614317 .1752152
_cons | 115.265 1.001901 115.05 0.000 113.2996 117.2304
------------------------------------------------------------------------------
(C)
. reg bwght cigs
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(1, 1386) = 32.24
Model | 13060.4194 1 13060.4194 Prob > F = 0.0000
Residual | 561551.3 1,386 405.159668 R-squared = 0.0227
-------------+---------------------------------- Adj R-squared = 0.0220
Total | 574611.72 1,387 414.283864 Root MSE = 20.129
------------------------------------------------------------------------------
bwght | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
cigs | -.5137721 .0904909 -5.68 0.000 -.6912861 -.3362581
_cons | 119.7719 .5723407 209.27 0.000 118.6492 120.8946
------------------------------------------------------------------------------
(E)
wald test
test cigs=2* faminc
( 1) - 2*faminc + cigs = 0
F( 1, 1385) = 42.35
Prob > F = 0.0000
Conclusion: Both family income and cigarette consumption significantly affect birth
weight.
Family income positively impacts birth weight, whereas cigarette consumption has a
negative impact.
The relationship between these variables is complex and not directly proportional.
4.2
(A)
g lnwage=log(wage)
reg lnwage educ exper tenure
Source | SS df MS Number of obs = 900
-------------+---------------------------------- F(3, 896) = 52.15
Model | 23.6080086 3 7.86933619 Prob > F = 0.0000
Residual | 135.21098 896 .150905112 R-squared = 0.1486
-------------+---------------------------------- Adj R-squared = 0.1458
Total | 158.818989 899 .176661834 Root MSE = .38847
------------------------------------------------------------------------------
lnwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0731166 .0066357 11.02 0.000 .0600933 .0861399
exper | .0153578 .0034253 4.48 0.000 .0086353 .0220804
tenure | .0129641 .0026307 4.93 0.000 .007801 .0181272
_cons | 5.528329 .1127946 49.01 0.000 5.306957 5.749702
------------------------------------------------------------------------------
(B)
wald test
. test exper= educ
( 1) - educ + exper = 0
F( 1, 896) = 95.74
Prob > F = 0.0000
(C)
redundant test
. test exper = 0
( 1) exper = 0
F( 1, 896) = 20.10
Prob > F = 0.0000
(D)
. reg lnwage educ exper
Source | SS df MS Number of obs = 900
-------------+---------------------------------- F(2, 897) = 64.41
Model | 19.9433397 2 9.97166984 Prob > F = 0.0000
Residual | 138.875649 897 .154822351 R-squared = 0.1256
-------------+---------------------------------- Adj R-squared = 0.1236
Total | 158.818989 899 .176661834 Root MSE = .39347
------------------------------------------------------------------------------
lnwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .075865 .0066975 11.33 0.000 .0627204 .0890095
exper | .0194704 .0033649 5.79 0.000 .0128664 .0260745
_cons | 5.537798 .1142326 48.48 0.000 5.313604 5.761993
------------------------------------------------------------------------------
Conclusion: Education, experience, and tenure all positively and significantly influence
wages.
The coefficients for education and experience are distinct and contribute significantly to
explaining wage variations.
4.3
(A)
reg lnDI lnY lnR
Source | SS df MS Number of obs = 26
-------------+---------------------------------- F(2, 23) = 525.29
Model | 1.06594003 2 .532970017 Prob > F = 0.0000
Residual | .023336112 23 .001014614 R-squared = 0.9786
-------------+---------------------------------- Adj R-squared = 0.9767
Total | 1.08927615 25 .043571046 Root MSE = .03185
------------------------------------------------------------------------------
lnDI | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
lnY | 1.227364 .0439292 27.94 0.000 1.13649 1.318239
lnR | -.0299645 .0213055 -1.41 0.173 -.0740382 .0141092
_cons | 7.511585 .2194947 34.22 0.000 7.057526 7.965645
------------------------------------------------------------------------------
(B)
. test lnY=0
( 1) lnY = 0
F( 1, 23) = 780.62
Prob > F = 0.0000
(C)
. test lnY=1
( 1) lnY = 1
F( 1, 23) = 26.79
Prob > F = 0.0000
Conclusion: Income is a significant determinant of disposable income,
with a coefficient significantly greater than one.
The interest rate does not have a statistically significant impact on disposable income in
this model.
____________________________________________________________________________________
_____________________________________________________________________
Chapter: 05
------------
5.1
. gen log_Imports=( Imports)
. gen log_GDP=( GDP)
. gen log_CPI=( CPI)
. reg log_Imports log_GDP log_CPI
Source | SS df MS Number of obs = 75
-------------+---------------------------------- F(2, 72) = 1622.72
Model | 9.7283e+09 2 4.8642e+09 Prob > F = 0.0000
Residual | 215821417 72 2997519.68 R-squared = 0.9783
-------------+---------------------------------- Adj R-squared = 0.9777
Total | 9.9441e+09 74 134380020 Root MSE = 1731.3
------------------------------------------------------------------------------
log_Imports | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_GDP | 816.443 50.00863 16.33 0.000 716.7526 916.1334
log_CPI | 82.20536 25.07781 3.28 0.002 32.21366 132.1971
_cons | -40850.65 1849.574 -22.09 0.000 -44537.71 -37163.59
------------------------------------------------------------------------------
. correlate log_Imports log_GDP log_CPI
(obs=75)
| log_Im~s log_GDP log_CPI
-------------+---------------------------
log_Imports | 1.0000
log_GDP | 0.9875 1.0000
log_CPI | 0.9476 0.9400 1.0000
. reg log_Imports log_GDP
Source | SS df MS Number of obs = 75
-------------+---------------------------------- F(1, 73) = 2853.74
Model | 9.6961e+09 1 9.6961e+09 Prob > F = 0.0000
Residual | 248030850 73 3397682.88 R-squared = 0.9751
-------------+---------------------------------- Adj R-squared = 0.9747
Total | 9.9441e+09 74 134380020 Root MSE = 1843.3
------------------------------------------------------------------------------
log_Imports | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_GDP | 970.5328 18.16784 53.42 0.000 934.3244 1006.741
_cons | -44353.42 1607.281 -27.60 0.000 -47556.72 -41150.11
------------------------------------------------------------------------------
. reg log_Imports log_CPI
Source | SS df MS Number of obs = 75
-------------+---------------------------------- F(1, 73) = 642.35
Model | 8.9293e+09 1 8.9293e+09 Prob > F = 0.0000
Residual | 1.0148e+09 73 13901085.7 R-squared = 0.8980
-------------+---------------------------------- Adj R-squared = 0.8966
Total | 9.9441e+09 74 134380020 Root MSE = 3728.4
------------------------------------------------------------------------------
log_Imports | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_CPI | 467.0531 18.42811 25.34 0.000 430.3259 503.7803
_cons | -16116.12 2284.724 -7.05 0.000 -20669.56 -11562.67
------------------------------------------------------------------------------
. reg log_GDP log_CPI
Source | SS df MS Number of obs = 75
-------------+---------------------------------- F(1, 73) = 553.94
Model | 9095.21732 1 9095.21732 Prob > F = 0.0000
Residual | 1198.594 73 16.4190959 R-squared = 0.8836
-------------+---------------------------------- Adj R-squared = 0.8820
Total | 10293.8113 74 139.105558 Root MSE = 4.052
------------------------------------------------------------------------------
log_GDP | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_CPI | .4713712 .0200277 23.54 0.000 .4314561 .5112864
_cons | 30.29548 2.483042 12.20 0.000 25.34679 35.24418
------------------------------------------------------------------------------
. vif
Variable | VIF 1/VIF
-------------+----------------------
log_CPI | 1.00 1.000000
-------------+----------------------
Mean VIF | 1.00
Conclusion: Both GDP and CPI are significant predictors of imports,
with GDP having a larger impact. The high R-squared values in the regressions
indicate that these models explain a substantial portion of the variance in imports.
The strong correlations among the variables suggest that they are interrelated,
which is further confirmed by the significant regression coefficients. The VIF analysis
shows no multicollinearity issues,
implying that the regression results are reliable.
5.2
. gen log_Imports=( Imports)
. gen log_GDP=( GDP)
. gen log_CPI=( CPI)
. reg log_Imports log_GDP log_CPI
Source | SS df MS Number of obs = 23
-------------+---------------------------------- F(2, 20) = 735.20
Model | 3.9193e+10 2 1.9596e+10 Prob > F = 0.0000
Residual | 533089408 20 26654470.4 R-squared = 0.9866
-------------+---------------------------------- Adj R-squared = 0.9852
Total | 3.9726e+10 22 1.8057e+09 Root MSE = 5162.8
------------------------------------------------------------------------------
log_Imports | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_GDP | 3701.469 305.4213 12.12 0.000 3064.371 4338.566
log_CPI | -120.1663 146.7966 -0.82 0.423 -426.3788 186.0461
_cons | -153804.6 15994.58 -9.62 0.000 -187168.7 -120440.5
------------------------------------------------------------------------------
. correl log_Imports log_GDP log_CPI
(obs=23)
| log_Im~s log_GDP log_CPI
-------------+---------------------------
log_Imports | 1.0000
log_GDP | 0.9930 1.0000
log_CPI | 0.9424 0.9553 1.0000
. reg log_Imports log_GDP
Source | SS df MS Number of obs = 23
-------------+---------------------------------- F(1, 21) = 1493.19
Model | 3.9175e+10 1 3.9175e+10 Prob > F = 0.0000
Residual | 550950315 21 26235729.3 R-squared = 0.9861
-------------+---------------------------------- Adj R-squared = 0.9855
Total | 3.9726e+10 22 1.8057e+09 Root MSE = 5122.1
------------------------------------------------------------------------------
log_Imports | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_GDP | 3462.636 89.60858 38.64 0.000 3276.285 3648.987
_cons | -142196.9 7341.077 -19.37 0.000 -157463.5 -126930.3
------------------------------------------------------------------------------
. reg log_Imports log_CPI
Source | SS df MS Number of obs = 23
-------------+---------------------------------- F(1, 21) = 166.56
Model | 3.5278e+10 1 3.5278e+10 Prob > F = 0.0000
Residual | 4.4480e+09 21 211808596 R-squared = 0.8880
-------------+---------------------------------- Adj R-squared = 0.8827
Total | 3.9726e+10 22 1.8057e+09 Root MSE = 14554
------------------------------------------------------------------------------
log_Imports | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_CPI | 1579.323 122.3747 12.91 0.000 1324.831 1833.815
_cons | 36597.85 8455.932 4.33 0.000 19012.77 54182.92
------------------------------------------------------------------------------
. reg log_GDP log_CPI
Source | SS df MS Number of obs = 23
-------------+---------------------------------- F(1, 21) = 219.13
Model | 2981.59716 1 2981.59716 Prob > F = 0.0000
Residual | 285.740339 21 13.6066828 R-squared = 0.9125
-------------+---------------------------------- Adj R-squared = 0.9084
Total | 3267.3375 22 148.515341 Root MSE = 3.6887
------------------------------------------------------------------------------
log_GDP | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_CPI | .4591392 .0310167 14.80 0.000 .3946364 .523642
_cons | 51.43969 2.143215 24.00 0.000 46.98263 55.89675
------------------------------------------------------------------------------
. vif
Variable | VIF 1/VIF
-------------+----------------------
log_CPI | 1.00 1.000000
-------------+----------------------
Mean VIF | 1.00
The regression analysis shows that `log_GDP` is a significant predictor of `log_Imports`,
with a positive relationship, while `log_CPI` is not significant when both variables are
included in the model. The model explains 98.66% of the variance in `log_Imports`,
indicating a strong fit. Simple regressions confirm `log_GDP` and `log_CPI` each have
strong individual relationships with `log_Imports`. There are no multicollinearity issues, as
indicated by the low VIF values.
5.3
. gen log_M4 =( M4 )
. gen log_Y =( Y )
. gen log_R1=( R1)
. gen log_R2 =( R2)
. reg log_M4 log_Y log_R1
Source | SS df MS Number of obs = 38
-------------+---------------------------------- F(2, 35) = 542.87
Model | 1.7859e+12 2 8.9296e+11 Prob > F = 0.0000
Residual | 5.7571e+10 35 1.6449e+09 R-squared = 0.9688
-------------+---------------------------------- Adj R-squared = 0.9670
Total | 1.8435e+12 37 4.9824e+10 Root MSE = 40557
------------------------------------------------------------------------------
log_M4 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_Y | 2.301075 .0698359 32.95 0.000 2.1593 2.442849
log_R1 | -11359.09 1928.848 -5.89 0.000 -15274.86 -7443.32
_cons | -450405.6 26855.5 -16.77 0.000 -504925.2 -395886.1
------------------------------------------------------------------------------
. reg log_M4 log_Y log_R1 log_R2
Source | SS df MS Number of obs = 38
-------------+---------------------------------- F(3, 34) = 445.59
Model | 1.7978e+12 3 5.9926e+11 Prob > F = 0.0000
Residual | 4.5725e+10 34 1.3449e+09 R-squared = 0.9752
-------------+---------------------------------- Adj R-squared = 0.9730
Total | 1.8435e+12 37 4.9824e+10 Root MSE = 36672
------------------------------------------------------------------------------
log_M4 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_Y | 2.267297 .0641643 35.34 0.000 2.1369 2.397695
log_R1 | -6553.293 2379.922 -2.75 0.009 -11389.88 -1716.709
log_R2 | -7556.556 2546.174 -2.97 0.005 -12731 -2382.108
_cons | -427080.8 25523.33 -16.73 0.000 -478950.5 -375211.2
------------------------------------------------------------------------------
. correl log_M4 log_Y log_R1 log_R2
(obs=38)
| log_M4 log_Y log_R1 log_R2
-------------+------------------------------------
log_M4 | 1.0000
log_Y | 0.9684 1.0000
log_R1 | 0.0075 0.1862 1.0000
log_R2 | -0.1838 -0.0055 0.6675 1.0000
. reg log_Y log_R1 log_R2
Source | SS df MS Number of obs = 38
-------------+---------------------------------- F(2, 35) = 1.22
Model | 2.2724e+10 2 1.1362e+10 Prob > F = 0.3082
Residual | 3.2666e+11 35 9.3331e+09 R-squared = 0.0650
-------------+---------------------------------- Adj R-squared = 0.0116
Total | 3.4938e+11 37 9.4427e+09 Root MSE = 96608
------------------------------------------------------------------------------
log_Y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_R1 | 9457.304 6062.315 1.56 0.128 -2849.849 21764.46
log_R2 | -7038.804 6601.138 -1.07 0.294 -20439.83 6362.219
_cons | 295270 45054.07 6.55 0.000 203805.3 386734.6
------------------------------------------------------------------------------
. reg log_R1 log_Y log_R2
Source | SS df MS Number of obs = 38
-------------+---------------------------------- F(2, 35) = 16.26
Model | 220.555828 2 110.277914 Prob > F = 0.0000
Residual | 237.439335 35 6.78398099 R-squared = 0.4816
-------------+---------------------------------- Adj R-squared = 0.4519
Total | 457.995163 37 12.3782476 Root MSE = 2.6046
------------------------------------------------------------------------------
log_R1 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_Y | 6.87e-06 4.41e-06 1.56 0.128 -2.07e-06 .0000158
log_R2 | .7279343 .1325253 5.49 0.000 .4588937 .996975
_cons | 1.380364 1.797682 0.77 0.448 -2.269125 5.029853
------------------------------------------------------------------------------
. reg log_R2 log_Y log_R1
Source | SS df MS Number of obs = 38
-------------+---------------------------------- F(2, 35) = 15.09
Model | 178.83366 2 89.4168301 Prob > F = 0.0000
Residual | 207.444574 35 5.92698784 R-squared = 0.4630
-------------+---------------------------------- Adj R-squared = 0.4323
Total | 386.278234 37 10.4399523 Root MSE = 2.4345
------------------------------------------------------------------------------
log_R2 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
log_Y | -4.47e-06 4.19e-06 -1.07 0.294 -.000013 4.04e-06
log_R1 | .6359773 .1157839 5.49 0.000 .4009235 .8710311
_cons | 3.086696 1.612068 1.91 0.064 -.1859746 6.359368
------------------------------------------------------------------------------
. vif
Variable | VIF 1/VIF
-------------+----------------------
log_R1 | 1.04 0.965332
log_Y | 1.04 0.965332
-------------+----------------------
Mean VIF | 1.04
The regression analysis shows that `log_Y` is a significant positive predictor of `log_M4`,
while `log_R1` has a significant negative relationship. Including `log_R2` in the model
slightly improves the fit, but both `log_R1` and `log_R2` remain significant negative
predictors of `log_M4`. The model explains 97.52% of the variance in `log_M4`, indicating
a strong fit. The correlation matrix shows moderate correlations between `log_R1` and
`log_R2`, but the VIF values indicate no multicollinearity issues.
If multicollinearity were to occur, we could address it by:
1. Removing highly correlated predictors.
2. Combining correlated variables into a single predictor.
3. Using techniques like Ridge Regression or Principal Component Analysis (PCA) to
mitigate multicollinearity.
____________________________________________________________________________________
_______________________________________________________________________
Chapter 06
-----------
6.1
Step 1: Run Regression
Command: regress price sqrft
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(1, 86) = 140.79
Model | 5.6980e+11 1 5.6980e+11 Prob > F = 0.0000
Residual | 3.4805e+11 86 4.0471e+09 R-squared = 0.6208
-------------+---------------------------------- Adj R-squared = 0.6164
Total | 9.1785e+11 87 1.0550e+10 Root MSE = 63617
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sqrft | 140.211 11.81664 11.87 0.000 116.7203 163.7017
_cons | 11204.14 24742.61 0.45 0.652 -37982.53 60390.82
------------------------------------------------------------------------------
Step 2: Checking of Heteroskedasticity
Using White Test:
Command: estat imtest, white
White's test
H0: Homoskedasticity
Ha: Unrestricted heteroskedasticity
chi2(2) = 16.14
Prob > chi2 = 0.0003
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 16.14 2 0.0003
Skewness | 12.28 1 0.0005
Kurtosis | -503685.18 1 1.0000
---------------------+----------------------------
Total | -503656.76 4 1.0000
--------------------------------------------------
As the P value is significant at 5% therefore the null hyp. is rejected which means
heteroskedasticity is present.
Step 3: Perform GLS Estimation and Check for Heteroskedasticity
Case (a): Var(ui) = σ²sqrft_i
Command: gen weight_a = 1/sqrft
Command: reg price weight_a
Command: estat imtest, white
Results: White's test
H0: Homoskedasticity
Ha: Unrestricted heteroskedasticity
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(1, 86) = 78.28
Model | 4.3736e+11 1 4.3736e+11 Prob > F = 0.0000
Residual | 4.8049e+11 86 5.5871e+09 R-squared = 0.4765
-------------+---------------------------------- Adj R-squared = 0.4704
Total | 9.1785e+11 87 1.0550e+10 Root MSE = 74747
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
weight_a | -5.58e+08 6.30e+07 -8.85 0.000 -6.83e+08 -4.32e+08
_cons | 589419.3 34377.05 17.15 0.000 521080 657758.7
------------------------------------------------------------------------------
-------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 17.34 2 0.0002
Skewness | 12.07 1 0.0005
Kurtosis |-6837088.39 1 1.0000
---------------------+----------------------------
Total |-6837058.98 4 1.0000
--------------------------------------------------
chi2(2) = 17.34
Prob > chi2 = 0.0002
As the P value is significant at 5% therefore the null hyp. is rejected which means
heteroskedasticity is present.
Case (b): Var(ui) = σ²sqrft²_i
Command: gen weight_b = 1/(sqrft^2)
Command: reg price weight_b
Command: estat imtest, white
Result: White's test
H0: Homoskedasticity
Ha: Unrestricted heteroskedasticity
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(1, 86) = 52.67
Model | 3.4860e+11 1 3.4860e+11 Prob > F = 0.0000
Residual | 5.6925e+11 86 6.6192e+09 R-squared = 0.3798
-------------+---------------------------------- Adj R-squared = 0.3726
Total | 9.1785e+11 87 1.0550e+10 Root MSE = 81358
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
weight_b | -4.64e+11 6.40e+10 -7.26 0.000 -5.91e+11 -3.37e+11
_cons | 431651.4 20913.49 20.64 0.000 390076.7 473226
------------------------------------------------------------------------------
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 2.79 1 0.0946
Skewness | 9.90 1 0.0017
Kurtosis | -1.97e+07 1 1.0000
---------------------+----------------------------
Total | -1.97e+07 3 1.0000
--------------------------------------------------
.
chi2(1) = 2.79
Prob > chi2 = 0.0946
As the P value is insignificant at 5% therefore the null hyp is accepted which means
heteroskedasticity is not present.
Conclusion:
The initial model has significant heteroskedasticity.
GLS Case (a):This suggests that the chosen weighting scheme did not effectively address
the heteroskedasticity.
GLS Case (b): Using weights proportional to 1/sqfeet² successfully eliminated
heteroskedasticity (White's test p-value = 0.0946).
6.2
Exercise 6.2
Step 1: Run Regression
Command: regress netprofitsales Noempl
Result:
Source | SS df MS Number of obs = 143
-------------+---------------------------------- F(1, 141) = 1.98
Model | .01466615 1 .01466615 Prob > F = 0.1621
Residual | 1.04691193 141 .007424907 R-squared = 0.0138
-------------+---------------------------------- Adj R-squared = 0.0068
Total | 1.06157808 142 .007475902 Root MSE = .08617
------------------------------------------------------------------------------
netprofits~s | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Noempl | .000498 .0003544 1.41 0.162 -.0002025 .0011986
_cons | .0206526 .0139538 1.48 0.141 -.0069332 .0482383
------------------------------------------------------------------------------
.279487 +
| *
n |
e |
t |* * * *
| * * * *
p | *** * *
r | * * ** *
o | * * ** * *
f | * ** * * ** * *
i | **** * ** * ** ** * *
t | **** **** ***** * * *
/ | * ** *** ********** * * * * * *
s | * ** * ** * **
a | * *
l | * *
e | ** **
s | * *
-.211482 + ** * * *
+----------------------------------------------------------------+
3 No empl. 140
Step 2: White test
Command: estat imtest, white
Result:
White's test
H0: Homoskedasticity
Ha: Unrestricted heteroskedasticity
chi2(2) = 0.05
Prob > chi2 = 0.9753
As the P value is insignificant at 5% therefore the null hyp is accepted which means
heteroskedasticity is not present.
Exercise 6.3:
Step 1: Run Regression
reg Y X
Result:
Source | SS df MS Number of obs = 38
-------------+---------------------------------- F(1, 36) = 25408.97
Model | 2320612.02 1 2320612.02 Prob > F = 0.0000
Residual | 3287.8952 36 91.3304222 R-squared = 0.9986
-------------+---------------------------------- Adj R-squared = 0.9985
Total | 2323899.91 37 62808.1057 Root MSE = 9.5567
------------------------------------------------------------------------------
Y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
X | 1.059399 .0066461 159.40 0.000 1.04592 1.072878
_cons | -8.672959 1.845795 -4.70 0.000 -12.4164 -4.929513
Breusch Pegan Test
Command: hettest
Result:
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of Y
H0: Constant variance
chi2(1) = 0.72
Prob > chi2 = 0.3970
As the P value is insignificant at 5% therefore the null hyp is accepted which means
heteroskedasticity is not present.
White Test
Command: estat imtest, white
Result: White's test
H0: Homoskedasticity
Ha: Unrestricted heteroskedasticity
chi2(2) = 3.81
Prob > chi2 = 0.1487
As the P value is insignificant at 5% therefore the null hyp is accepted which means
heteroskedasticity is not present.
6.4
Exercise 6.4
Step 1: Run Regression
Command: reg sleep totwrk educ age yngkid male
Result:
Source | SS df MS Number of obs = 706
-------------+---------------------------------- F(5, 700) = 19.38
Model | 16933101.4 5 3386620.28 Prob > F = 0.0000
Residual | 122306734 700 174723.906 R-squared = 0.1216
-------------+---------------------------------- Adj R-squared = 0.1153
Total | 139239836 705 197503.313 Root MSE = 418
------------------------------------------------------------------------------
sleep | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
totwrk | -.1656902 .0180061 -9.20 0.000 -.2010425 -.1303378
educ | -11.76532 5.87132 -2.00 0.045 -23.29283 -.2378133
age | 2.009938 1.520833 1.32 0.187 -.9760034 4.995879
yngkid | 4.784242 50.01991 0.10 0.924 -93.42278 102.9913
male | 87.54557 34.66501 2.53 0.012 19.48572 155.6054
_cons | 3640.234 114.332 31.84 0.000 3415.759 3864.709
------------------------------------------------------------------------------
Step 2:-
Breusch–Pagan test to check heteroskedasticity:-
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of sleep
H0: Constant variance
chi2(1) = 2.42
Prob > chi2 = 0.1199
As the P value is insignificant at 5% therefore the null hyp is accepted which means
heteroskedasticity is not present.
Step 3: Checking estimated variance of u higher for men than women:-
Command:
predict uhat, residuals
gen uhat_sq = uhat^2
reg uhat_sq male
Result:
Source | SS df MS Number of obs = 706
-------------+---------------------------------- F(1, 704) = 1.25
Model | 1.5848e+11 1 1.5848e+11 Prob > F = 0.2648
Residual | 8.9597e+13 704 1.2727e+11 R-squared = 0.0018
-------------+---------------------------------- Adj R-squared = 0.0003
Total | 8.9756e+13 705 1.2731e+11 Root MSE = 3.6e+05
------------------------------------------------------------------------------
uhat_sq | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
male | -30234.55 27093.98 -1.12 0.265 -83429.22 22960.13
_cons | 190369.1 20393.91 9.33 0.000 150328.9 230409.2
------------------------------------------------------------------------------
As p value of coefficient of the male variable in the regression of uhat_sq on male is
insignificant, it indicates that the variance of the residuals (u) is same between men and
women.
6.5
Step 1: Run Regression
Command: regress price lotsize sqrft bdrms
Result:
Source | SS df MS Number of obs = 88
-------------+---------------------------------- F(3, 84) = 57.46
Model | 6.1713e+11 3 2.0571e+11 Prob > F = 0.0000
Residual | 3.0072e+11 84 3.5800e+09 R-squared = 0.6724
-------------+---------------------------------- Adj R-squared = 0.6607
Total | 9.1785e+11 87 1.0550e+10 Root MSE = 59833
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
lotsize | 2.067707 .6421258 3.22 0.002 .790769 3.344644
sqrft | 122.7782 13.23741 9.28 0.000 96.45415 149.1022
bdrms | 13852.52 9010.145 1.54 0.128 -4065.14 31770.18
_cons | -21770.31 29475.04 -0.74 0.462 -80384.66 36844.04
Step 2: Test for heteroskedasticity
Command: hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of price
H0: Constant variance
chi2(1) = 20.55
Prob > chi2 = 0.0000
As the P value is significant at 5% therefore the null hyp. is rejected which means
heteroskedasticity is present.
____________________________________________________________________________________
_____________________________________________________________________
Chapter : 07
7.1
Solution: -
Step 1: Run Regression
reg I R Y
Source | SS df MS Number of obs = 30
-------------+---------------------------------- F(2, 27) = 59.98
Model | 1329.98704 2 664.993518 Prob > F = 0.0000
Residual | 299.335844 27 11.0865127 R-squared = 0.8163
-------------+---------------------------------- Adj R-squared = 0.8027
Total | 1629.32288 29 56.1835476 Root MSE = 3.3296
------------------------------------------------------------------------------
I | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
R | -.1841962 .1264157 -1.46 0.157 -.4435798 .0751874
Y | .7699114 .0717905 10.72 0.000 .6226094 .9172134
_cons | 6.224938 2.510894 2.48 0.020 1.073009 11.37687
Step 2: Generate time
gen time=_n
tsset time
Step 3: Run Durbin Watson Test
dwstat
Durbin–Watson d-statistic( 3, 30) = .852153
The result of Durbin–Watson test is 0.85, which indicates +ve autocorrelation.
Step: Resolve
we use Cocraine Orcad test
command: prais I R Y, corc
Iteration 0: rho = 0.0000
Iteration 1: rho = 0.5677
Iteration 2: rho = 0.6138
Iteration 3: rho = 0.6146
Iteration 4: rho = 0.6146
Iteration 5: rho = 0.6146
Cochrane–Orcutt AR(1) regression with iterated estimates
Source | SS df MS Number of obs = 29
-------------+---------------------------------- F(2, 26) = 19.83
Model | 283.65568 2 141.82784 Prob > F = 0.0000
Residual | 185.963077 26 7.15242602 R-squared = 0.6040
-------------+---------------------------------- Adj R-squared = 0.5736
Total | 469.618757 28 16.7720985 Root MSE = 2.6744
------------------------------------------------------------------------------
I | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
R | -.2957754 .078671 -3.76 0.001 -.4574859 -.1340649
Y | .7848538 .144227 5.44 0.000 .488391 1.081317
_cons | 7.329872 3.658536 2.00 0.056 -.1903569 14.8501
-------------+----------------------------------------------------------------
rho | .6146382
------------------------------------------------------------------------------
Durbin–Watson statistic (original) = 0.852153
Durbin–Watson statistic (transformed) = 1.608128
7.2
Step 1: Run Regression
reg Q P F R
Source | SS df MS Number of obs = 30
-------------+---------------------------------- F(3, 26) = 20.77
Model | 272.177333 3 90.7257777 Prob > F = 0.0000
Residual | 113.590667 26 4.36887181 R-squared = 0.7055
-------------+---------------------------------- Adj R-squared = 0.6716
Total | 385.768 29 13.3023448 Root MSE = 2.0902
------------------------------------------------------------------------------
Q | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
P | .3162285 .0876154 3.61 0.001 .1361326 .4963245
F | .0615832 .0809901 0.76 0.454 -.1048944 .2280608
R | .0581384 .0080687 7.21 0.000 .041553 .0747239
_cons | -11.59809 14.96435 -0.78 0.445 -42.35776 19.16157
------------------------------------------------------------------------------
Step: 2 Generate time
gen time=_n
tsset time
Step: 3 Run Durbin Watson Test
dwstat
Durbin–Watson d-statistic( 4, 30) = 1.805563
As the result of Durbin Watson test is close to 2, now we need to check the severity of
autocorrelation.
Step: 4 Run Breusch Godfrey test
bgodfrey, lags(1)
Breusch–Godfrey LM test for autocorrelation
---------------------------------------------------------------------------
lags(p) | chi2 df Prob > chi2
-------------+-------------------------------------------------------------
1 | 0.044 1 0.8331
---------------------------------------------------------------------------
H0: no serial correlation
The result shows that the value of P is insignificant and null hypothesis is accepted.
____________________________________________________________________________________
_______________________________________________________________
Chapter: 08
------------
8.1
Step 1
. summarize
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
iq | 901 101.0866 15.06789 50 145
wage | 901 1191.266 81.10248 1023 1615.6
step 2
. reg wage iq
Source | SS df MS Number of obs = 901
-------------+---------------------------------- F(1, 899) = 93.38
Model | 557036.385 1 557036.385 Prob > F = 0.0000
Residual | 5362814.78 899 5965.31121 R-squared = 0.0941
-------------+---------------------------------- Adj R-squared = 0.0931
Total | 5919851.16 900 6577.6124 Root MSE = 77.235
------------------------------------------------------------------------------
wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
iq | 1.65108 .1708609 9.66 0.000 1.315747 1.986412
_cons | 1024.364 17.46236 58.66 0.000 990.0924 1058.636
------------------------------------------------------------------------------
step 3
. display iq*10
740
step 4
. g lniq= log( iq)
step 5
. reg wage lniq
Source | SS df MS Number of obs = 901
-------------+---------------------------------- F(1, 899) = 89.15
Model | 534104.298 1 534104.298 Prob > F = 0.0000
Residual | 5385746.87 899 5990.81965 R-squared = 0.0902
-------------+---------------------------------- Adj R-squared = 0.0892
Total | 5919851.16 900 6577.6124 Root MSE = 77.4
------------------------------------------------------------------------------
wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
lniq | 154.5317 16.3662 9.44 0.000 122.4113 186.6521
_cons | 479.7907 75.39528 6.36 0.000 331.8195 627.762
------------------------------------------------------------------------------
. display lniq*10
43.040652
Comments
- Summary Statistics**: The summary shows the mean, standard deviation, minimum, and
maximum values of the variables "iq" and "wage".
- Simple Linear Regression : The regression of "wage" on "iq" reveals a significant
relationship (p < 0.05). For every 1-point increase in IQ, there's a corresponding increase of
approximately 1.65 in wage, holding other variables constant.
- Transformation : The variable "iq" is transformed into its natural logarithm, denoted as
"lniq".
- Regression with Transformed Variable: The regression of "wage" on the natural logarithm
of "iq" also shows a significant relationship (p < 0.05). For every 10% increase in IQ, there's
a corresponding increase of approximately 154.53 in wage, holding other variables
constant.
____________________________________________________________________________________
_________________________________________________________________
Chapter -21
Panel Data
question by mysef
. xtset id time
Panel variable: id (strongly balanced)
Time variable: time, 1960 to 1999
Delta: 1 unit
. xtreg Y X E, fe
Fixed-effects (within) regression Number of obs = 320
Group variable: id Number of groups = 8
R-squared: Obs per group:
Within = 0.6479 min = 40
Between = 0.9878 avg = 40.0
Overall = 0.7397 max = 40
F(2,310) = 285.27
corr(u_i, Xb) = 0.4311 Prob > F = 0.0000
------------------------------------------------------------------------------
Y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
X | .4737093 .0218886 21.64 0.000 .4306403 .5167784
E | 1.845824 .157163 11.74 0.000 1.536583 2.155065
_cons | 52.81111 2.434349 21.69 0.000 48.02117 57.60104
-------------+----------------------------------------------------------------
sigma_u | .52193716
sigma_e | 2.6826443
rho | .03647321 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(7, 310) = 1.23 Prob > F = 0.2843
. xtreg Y X E, re
Random-effects GLS regression Number of obs = 320
Group variable: id Number of groups = 8
R-squared: Obs per group:
Within = 0.6479 min = 40
Between = 0.9879 avg = 40.0
Overall = 0.7397 max = 40
Wald chi2(2) = 900.79
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
Y | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
X | .4966464 .0183199 27.11 0.000 .4607401 .5325528
E | 1.940393 .1538859 12.61 0.000 1.638783 2.242004
_cons | 50.27199 2.040134 24.64 0.000 46.2734 54.27058
-------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | 2.6826443
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. est store fe
. est store re
. hausman fe re
Note: the rank of the differenced variance matrix (0) does not equal the number of
coefficients
being tested (2); be sure this is what you expect, or there may be problems computing
the
test. Examine the output of your estimators for anything unexpected and possibly
consider
scaling your variables so that the coefficients are on a similar scale.
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| fe re Difference Std. err.
-------------+----------------------------------------------------------------
X | .4966464 .4966464 0 0
E | 1.940393 1.940393 0 0
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from xtreg.
B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
Test of H0: Difference in coefficients not systematic
chi2(0) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 0.00
Prob > chi2 = .
(V_b-V_B is not positive definite)
both FE and RE regressions indicate significant positive effects of variables
"X" and "E" on "Y", but the Hausman test doesn't provide conclusive evidence due to the
nature of the differenced variance matrix.