Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views32 pages

04 - Multiple Regression Asymptotics

Regression

Uploaded by

Muhammad Sufdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

04 - Multiple Regression Asymptotics

Regression

Uploaded by

Muhammad Sufdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Multiple Regression

Analysis: Asymptotics
ECONOMETRICS (ECON 360)
BEN VAN KAMMEN, PHD
Introduction
There is not a lot of new material in this chapter, unless one wants to get into proofs of the
Central Limit Theorem, probability limits, and convergence in distribution—which I prefer not to.
Instead my emphasis is on explaining why some of the Assumptions in the CLM are not so
restrictive and that inference according to Chapter 4 methods is still possible under weaker
assumptions about the distribution of the error term.
Outline
Consistency.
Asymptotic Normality and Large Sample Inference.
Asymptotic Efficiency of OLS.
Consistency
(Consistent Estimator) Defined: “An estimator that converges in probability to the population
parameter as the sample size grows without bound.”
This is stated formally by expressing the probability that the estimator falls outside an interval, 𝜀𝜀,
and that the probability approaches zero as the sample size increases, for any 𝜀𝜀.
Convergence in probability states that:
Pr 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟𝑛𝑛 − 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 > 𝜀𝜀 = 0 as 𝑛𝑛 → ∞.
Consistency (continued)
If one can collect an arbitrarily large amount of observations, he ought to be able to obtain an
estimate that gets closer and closer to the true parameter value.
If this is not the case, the estimator is inconsistent and not of much use.
Fortunately, under Assumptions MLR.1 through MLR. 4, the OLS estimators (𝛽𝛽̂0 through 𝛽𝛽̂𝑘𝑘 ) are
consistent estimators of their corresponding parameters.
Consistency (continued)
One can show this fairly easily for the simple regression model, using the estimator and the
definition of the model:
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖𝑖 + 𝑢𝑢𝑖𝑖 , and

∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥)(𝑦𝑦
̅ 𝑖𝑖 − 𝑦𝑦)
� ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥)𝑦𝑦
̅ 𝑖𝑖
̂
𝛽𝛽1 = = 𝑛𝑛 .
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 2 ∑𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 2
So,
∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥)(𝛽𝛽
̅ 0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖𝑖 + 𝑢𝑢𝑖𝑖 ) ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥)𝑢𝑢
̅ 𝑖𝑖
𝛽𝛽̂1 = 𝑛𝑛 2
= 𝛽𝛽1 + 𝑛𝑛 2
.
∑𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ ∑𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅
Consistency (continued)
This expression should be familiar from deriving the unbiasedness of OLS.
To show the consistency 𝛽𝛽̂1 , make a small modification, dividing the numerator
and denominator of the second term by the sample size.
𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥)𝑢𝑢
̅ 𝑖𝑖
̂
𝛽𝛽1 = 𝛽𝛽1 + −1 𝑛𝑛 .
𝑛𝑛 ∑ 2
𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅
Taking the probability limit (“plim”) of this, as 𝑛𝑛 → ∞, you find that the
numerator converges to the covariance of 𝑥𝑥1 and 𝑢𝑢, and the denominator
converges to the variance of 𝑥𝑥1 .
Consistency (concluded)
And the properties of probability limits state that the plim of a ratio of two estimators equals the
ratio of their plims:
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 𝑢𝑢𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶(𝑥𝑥1 , 𝑢𝑢)
̂
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝛽𝛽1 = 𝛽𝛽1 + = 𝛽𝛽1 + .
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑥𝑥1
MLR.4 (SLR.4) states that 𝑥𝑥1 and 𝑢𝑢 are mean independent, which implies that their covariance is
zero. So,
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝛽𝛽̂1 = 𝛽𝛽1 , and
OLS is consistent as long as the error term is not correlated with the “x” variable(s).
OLS is consistent under weaker
assumptions
This is the weaker version of the fourth Assumption, MLR.4’, which states:
𝐸𝐸 𝑢𝑢 = 0 and 𝐶𝐶𝐶𝐶𝐶𝐶 𝑥𝑥𝑗𝑗 , 𝑢𝑢 = 0 ∀ 𝑗𝑗.
It is weaker because assuming merely that they are uncorrelated linearly does not rule out
higher order relationships between 𝑥𝑥𝑗𝑗 and 𝑢𝑢.
◦ The latter can make OLS biased (but still consistent), so if unbiasedness and consistency are both
desired, you still need (the stronger) Assumption MLR.4.
Mis-specified models are still
inconsistent
Inconsistency can be shown in a manner very similar to biasedness in the model with 2
explanatory variables.
If one estimates (𝛽𝛽�1 ) a regression that excludes 𝑥𝑥2 , such that:
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 𝑦𝑦𝑖𝑖
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥𝑖𝑖𝑖 + 𝑣𝑣𝑖𝑖 and 𝛽𝛽�1 = 𝑛𝑛 ,
∑𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 2

∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥𝑖𝑖𝑖 + 𝑣𝑣𝑖𝑖 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 𝑥𝑥𝑖𝑖𝑖
⇔ 𝛽𝛽�1 = = 𝛽𝛽1 + 𝛽𝛽2 𝑛𝑛 ,
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 2 ∑𝑖𝑖=1 𝑥𝑥𝑖𝑖𝑖 − 𝑥𝑥̅ 2
Mis-specified models are still
inconsistent (continued)
𝐶𝐶𝐶𝐶𝐶𝐶 𝑥𝑥1 , 𝑥𝑥2
the 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 of the estimator is 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝛽𝛽�1 = 𝛽𝛽1 + 𝛽𝛽2 𝛿𝛿1 ; 𝛿𝛿1 ≡ .
𝑉𝑉𝑉𝑉𝑉𝑉 𝑥𝑥1
The second term is the inconsistency, and this estimator converges closer to the inaccurate value
(𝛽𝛽1 + 𝛽𝛽2 𝛿𝛿1 ) as the sample size grows.
In the 𝑘𝑘 > 2 case, this result is general to all of the explanatory variables; none of the estimators
is consistent if the model is mis-specified like above.
Asymptotic normality and large sample
inference
This is the most consequential lesson from Chapter 5.
Knowing that an estimator is consistent is satisfying, but it doesn’t imply anything about the
distribution of the estimator, which is necessary for inference.
◦ The OLS estimators are normally distributed if the errors are assumed to be (with constant variance 𝜎𝜎 2 ),
as well as the values of (𝑦𝑦|𝑥𝑥1 . . . 𝑥𝑥𝑘𝑘 ).
◦ But what if the errors are not normally distributed?
◦ Consequently neither are the values of y.
◦ As the text points out, there are numerous such examples, e.g., when 𝑦𝑦 is bound by a range (like 0-100),
or in which it’s skewed (example 3.5), and the normality assumption is unrealistic.
Asymptotic normality and large sample
inference (continued)
However inference is based on the estimators have a constant mean (𝛽𝛽̂𝑗𝑗 ) and variance. When
they are standardized, they have mean zero and standard deviation 1 (note: we maintain the
homoskedasticity assumption).
Crucially, as the sample size approaches infinity, the distribution of the standardized estimator
converges to standard normal.
This property applies to all averages from random samples, and is known as the Central Limit
Theorem (CLT). Its implication is that:
𝛽𝛽̂𝑗𝑗 − 𝛽𝛽𝑗𝑗 𝑑𝑑 𝑑𝑑
→𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 0,1 ∀ 𝑗𝑗; → means 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑.
̂
𝑠𝑠𝑠𝑠 𝛽𝛽𝑗𝑗
Asymptotic normality and large sample
inference (continued)
Another way of saying it is that the distribution of the OLS estimator is asymptotically normal.
One more feature of the OLS asymptotics is that the estimator,𝜎𝜎� 2 , consistently estimates 𝜎𝜎 2 ,
the population error variance, so it no longer matters that the parameter is replaced by its
consistent estimator.
◦ Nor is it necessary to make a distinction between the standard normal and the 𝑡𝑡 distribution for
inference—because in large samples the 𝑡𝑡 distribution converges to standard normal anyway.
◦ For the sake of precision, however, 𝑡𝑡𝑛𝑛−𝑘𝑘−1 is the exact distribution for the estimators.
Asymptotic normality and large sample
inference (concluded)
Assumption MLR.6 has been replaced with a much weaker assumption—merely that the error
term has finite and homoskedastic variance.
As long as the sample size is “large”, inference can be conducted the same way as under
Assumption MLR.6, however.
◦ How many observations constitutes “large” is an open question.
◦ The requisite in some cases can be as low as 30 for the CLT to provide a good approximation, but if the
errors are highly skewed (“non-normal”) or if there are many regressors in the model (𝑘𝑘 “eats up” a lot
of degrees of freedom) reliable inference with 30 observations is overly optimistic.
Precision of the OLS estimates
Finally we investigate “how fast” the standard error shrinks as the sample size increases. The
variance of 𝛽𝛽̂𝑗𝑗 (square root is the standard error) is:
𝜎𝜎� 2 𝜎𝜎� 2
� 𝛽𝛽̂𝑗𝑗 =
𝑉𝑉𝑉𝑉𝑉𝑉 2 = 2 2 , where
𝑆𝑆𝑆𝑆𝑇𝑇𝑗𝑗 (1 − 𝑅𝑅𝑗𝑗 ) 𝑛𝑛𝑠𝑠𝑗𝑗 (1 − 𝑅𝑅𝑗𝑗 )
the total sum of squares of 𝑥𝑥𝑗𝑗 (𝑆𝑆𝑆𝑆𝑇𝑇𝑗𝑗 ) can be replaced according to the definition of 𝑥𝑥𝑗𝑗 ’s sample
variance (𝑠𝑠𝑗𝑗2 ):
𝑛𝑛 2
2
∑ 𝑥𝑥
𝑖𝑖=1 𝑖𝑖𝑖𝑖 − 𝑥𝑥̅
𝑗𝑗 𝑆𝑆𝑆𝑆𝑇𝑇𝑗𝑗
𝑠𝑠𝑗𝑗 = = .
𝑛𝑛 𝑛𝑛
Precision of the OLS estimates
(continued)
As 𝑛𝑛 gets large, these sample statistics each approach their population values.
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝜎𝜎� 2 = 𝜎𝜎 2 , 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑠𝑠𝑗𝑗2 = 𝜎𝜎𝑗𝑗2 , and 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑅𝑅𝑗𝑗2 = 𝜌𝜌𝑗𝑗2 , and
1
none of these parameters depends on sample size. Variance gets smaller at the rate ( ) because
𝑛𝑛
of the explicit “n” term in the denominator. I.e.,
𝜎𝜎 2 � 𝛽𝛽̂𝑗𝑗
𝜕𝜕𝑉𝑉𝑉𝑉𝑉𝑉 � 𝛽𝛽̂𝑗𝑗
𝑉𝑉𝑉𝑉𝑉𝑉
� 𝛽𝛽̂𝑗𝑗 =
𝑉𝑉𝑉𝑉𝑉𝑉 2 2 ; 𝜕𝜕𝜕𝜕
=−
𝑛𝑛
.
𝑛𝑛𝜎𝜎𝑗𝑗 (1 − 𝜌𝜌𝑗𝑗 )
Precision of the OLS estimated
(concluded)
1
−2
The asymptotic standard error is just the square root and it get smaller at the rate of (𝑛𝑛 ).
1 𝜎𝜎
̂
𝑠𝑠𝑠𝑠 𝛽𝛽𝑗𝑗 = 1.
𝑛𝑛
𝜎𝜎𝑗𝑗 1 − 𝜌𝜌𝑗𝑗2 2
F tests for exclusion restrictions, as well as t tests, can be conducted—for large samples—as you
learned in Chapter 4 under the assumption of normally distributed errors.
𝛽𝛽 has lots of consistent estimators
̂ also has the lowest asymptotic variance among estimators that are linear
The OLS estimator, 𝛽𝛽,
in parameters and rely on functions of 𝑥𝑥, e.g., 𝑔𝑔(𝑥𝑥).
An estimator that uses an alternative to 𝑔𝑔 𝑥𝑥 = 𝑥𝑥 can be called 𝛽𝛽�1 , and has the form:
∑𝑛𝑛𝑖𝑖=1 𝑧𝑧𝑖𝑖 − 𝑧𝑧 𝑦𝑦𝑖𝑖
𝛽𝛽�1 = 𝑛𝑛 ; 𝑧𝑧 ≡ 𝑔𝑔 𝑥𝑥𝑖𝑖 ; 𝑔𝑔 ≢ 𝑓𝑓 𝑥𝑥 = 𝑥𝑥.
∑𝑖𝑖=1 𝑧𝑧𝑖𝑖 − 𝑧𝑧̅ 𝑥𝑥𝑖𝑖 𝑖𝑖
As long as 𝑧𝑧 and 𝑥𝑥 are correlated, this estimator converges in probability to the true value of 𝛽𝛽1 ,
i.e., it is consistent.
𝛽𝛽 has lots of consistent estimators
(continued)
Depending on what kind of non-linear function “g” is, this can fail because correlation only
measures linear relationships.
And since 𝑥𝑥 and 𝑢𝑢 are mean independent,
𝐸𝐸 𝑢𝑢 𝑥𝑥1 = 𝐸𝐸 𝑢𝑢 𝑔𝑔 𝑥𝑥 = 𝐸𝐸 𝑢𝑢 𝑧𝑧 = 0; so are 𝑢𝑢 and 𝑧𝑧.

∑𝑛𝑛𝑖𝑖=1 𝑧𝑧𝑖𝑖 − 𝑧𝑧 𝑢𝑢𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶(𝑧𝑧, 𝑢𝑢)



𝛽𝛽1 = 𝛽𝛽1 + 𝑛𝑛 �
; 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝛽𝛽1 = 𝛽𝛽1 + = 𝛽𝛽1 .
∑𝑖𝑖=1 𝑧𝑧𝑖𝑖 − 𝑧𝑧̅ 𝑥𝑥𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶(𝑧𝑧, 𝑥𝑥)
Asymptotic efficiency of OLS
But the variance of 𝛽𝛽�1 is no less than the variance of 𝛽𝛽̂1 .
2
2 ∑𝑛𝑛𝑖𝑖=1 𝑧𝑧𝑖𝑖 − 𝑧𝑧 𝑢𝑢𝑖𝑖 𝜎𝜎 2 𝑉𝑉𝑉𝑉𝑉𝑉(𝑧𝑧)
𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽�1 = 𝐸𝐸 𝛽𝛽�1 − 𝛽𝛽1 = 𝐸𝐸 = , since
∑𝑛𝑛𝑖𝑖=1 𝑧𝑧𝑖𝑖 − 𝑧𝑧̅ 𝑥𝑥𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶 𝑧𝑧, 𝑥𝑥 2
only the “own” products show up in the numerator. And,
𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = , as before.
𝑉𝑉𝑉𝑉𝑉𝑉(𝑥𝑥)
Asymptotic efficiency of OLS (continued)
So in order for 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 ≤ 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽�1 ,
𝜎𝜎 2 𝜎𝜎 2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑧𝑧 2
≤ 2
⇔ 𝐶𝐶𝐶𝐶𝐶𝐶 𝑧𝑧, 𝑥𝑥 ≤ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑥𝑥 𝑉𝑉𝑉𝑉𝑉𝑉 𝑧𝑧 .
𝑉𝑉𝑉𝑉𝑉𝑉(𝑥𝑥) 𝐶𝐶𝐶𝐶𝐶𝐶 𝑧𝑧, 𝑥𝑥
This property is satisfied by the Cauchy-Schwartz Inequality, which states that there cannot be
more covariance between two variables than there is overall variance in them.
So the OLS estimator, 𝛽𝛽̂1 , has a smaller variance than any other estimator with the same form:
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝛽𝛽̂1 ≤ 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝛽𝛽�1 ; 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 denotes asymptotic variance.
Conclusion
According to the asymptotic properties of the OLS estimator:
◦ OLS is consistent,
◦ The estimator converges in distribution to standard normal,
◦ Inference can be performed based on the asymptotic convergence to the standard normal, and
◦ OLS is the most efficient among many consistent estimators of 𝛽𝛽.
A non-normal error term
. clear

.5
. drawnorm x, n(10000) means(12) sds(2) clear
(obs 10000)

.4
. generate u = rgamma(1,2)

.3
Density
. gen y=2+x+u

. reg y x

.2
The error term is definitely not normally

.1
distributed.

As the histogram (right) shows.

0
0 5 10 15 20 25
u
Bootstrapping
To reveal the distribution of 𝛽𝛽̂1 in the regression, 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢, I resample my 10,000
observations many (2000) times.
◦ This would take a long time, were it not for the software.

Stata code, for 𝑛𝑛 = 10:


bootstrap, reps(2000) size(10) saving(U:\ECON 360 - Spring 2015\BS 10.dta, every(1) replace) : reg y x
Normality?
You can judge whether it looks like the normal

1.5
distribution.
But a normal distribution is supposed to have 0
skewness (symmetry) and a kurtosis of 3.

1
Density
◦ This one has 0.25 (right) skewness and 6.746 kurtosis.
. summ _b_x, detail

.5
_b[x]

Percentiles Smallest
1% -.0410598 -.9976056
5% .375233 -.7887098

0
10% .5267513 -.7529542 Obs 2000 -1 0 1 2 3 4
25% .7741854 -.5366573 Sum of Wgt. 2000 _b[x]

50% .9984144 Mean .9962621


Largest Std. Dev. .3981664 Histogram of 2000 estimates (10 obs. each) of 𝛽𝛽�1 .
75% 1.201218 2.497071
90% 1.459825 2.522474 Variance .1585365
95% 1.634268 2.966994 Skewness .2467498
99% 2.093876 4.149807 Kurtosis 6.746221
Non-Normality
The statistical test for whether the distribution of beta hats is Normal, called the Jarque-Bera
statistic, rejects the null that the distribution is Normal.
◦ Code is: sktest _b_x
◦ Similar to a joint hypothesis with 2 restrictions. 𝐻𝐻0 : 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0 and 𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘 = 3.
◦ In this case, the p value is <0.0001.
Would a bigger sample size fail to reject
𝐻𝐻0 ? 𝑛𝑛 = 20
. summ _b_x, detail

2
_b[x]

1.5
Percentiles Smallest
1% .3718964 -.0062484
5% .6183816 .1094247

Density
10% .6945582 .1131481 Obs 2000

1
25% .8423931 .1282413 Sum of Wgt. 2000

50% .9921745 Mean .9891276

.5
Largest Std. Dev. .2382368
75% 1.133664 1.754412
90% 1.279575 1.784783 Variance .0567568

0
95% 1.377411 1.837642 Skewness .0079499 0 .5 1 1.5 2
_b[x]
99% 1.599728 2.046466 Kurtosis 3.905129

The skewness is mostly gone, but the distribution is still too “peaked” to be Normal: p value on the J-B
statistic is still <0.0001.
𝑛𝑛 = 50?
. summ _b_x, detail

3
_b[x]

Percentiles Smallest
1% .6642559 .4981143

2
5% .7648403 .5408446

Density
10% .8089489 .5615669 Obs 2000
25% .8994239 .5639632 Sum of Wgt. 2000

1
50% .9992069 Mean .9980457
Largest Std. Dev. .1472866
75% 1.090446 1.502628
90% 1.181844 1.509088 Variance .0216934

0
95% 1.240575 1.548912 Skewness .1596157 .5 1 1.5
_b[x]
99% 1.374213 1.551746 Kurtosis 3.348677

The skewness comes back a little, but the kurtosis is coming down now: p value on the J-B statistic is up
to <0.0005.
𝑛𝑛 = 100?
. summ _b_x, detail

4
_b[x]

Percentiles Smallest

3
1% .7633944 .5340427
5% .8333735 .6678689

Density
10% .8760604 .6756338 Obs 2000

2
25% .9362201 .6903498 Sum of Wgt. 2000

50% .9993333 Mean 1.0007

1
Largest Std. Dev. .0990559
75% 1.066902 1.29039
90% 1.127472 1.295925 Variance .0098121

0
95% 1.16099 1.300377 Skewness -.0652147
.6 .8 1 1.2 1.4
99% 1.235443 1.307498 Kurtosis 3.241735 _b[x]

Skewness/Kurtosis tests for Normality

Variable Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2)


joint
Prob>chi2
p>0.05; first
“fail to reject”!
_b_x 2.0e+03 0.2325 0.0388 5.67 0.0586
𝑛𝑛 = 250? Normality far from rejected.
. summ _b_x, detail

8
_b[x]

Percentiles Smallest

6
1% .8413554 .7718233
5% .8956159 .7786757
10% .917204 .799898 Obs 2000

Density
25% .9550751 .8113453 Sum of Wgt. 2000

4
50% .997361 Mean .9972047
Largest Std. Dev. .0636356
75% 1.0383 1.190229

2
90% 1.080251 1.191585 Variance .0040495
95% 1.103124 1.210835 Skewness .0216586
99% 1.15125 1.215675 Kurtosis 3.14645

0
. sktest _b_x .8 .9 1 1.1 1.2
_b[x]
Skewness/Kurtosis tests for Normality
joint
Variable Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2

_b_x 2.0e+03 0.6915 0.1817 1.94 0.3791


That’s asymptotic normality
And I only had to run 10,000 regressions to show it!

You might also like