Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Can't generate perfect theoretic standard normal.


From   Jeph Herrin <[email protected]>
To   [email protected]
Subject   Re: st: Can't generate perfect theoretic standard normal.
Date   Wed, 09 May 2007 19:18:52 -0400

However, perfection => infinity is not surprising.

Jeph Herrin wrote:
10x does not help much - the variance is closer to 1 but
kurtosis about the same:


set mem 1500m
set obs 100000000
* 100,000,000 obs!
generate perfuniform = _n/(_N+1)
generate perfnorm = invnorm(perfuniform)
summarize perfnorm, detail

. summarize perfnorm, detail

                          perfnorm
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -2.326348      -5.612001
 5%    -1.644854      -5.490852
10%    -1.281552      -5.418801       Obs            99999998
25%    -.6744897      -5.367129       Sum of Wgt.    99999998

50%    -3.74e-08                      Mean          -1.14e-07
                        Largest       Std. Dev.      .9999995
75%     .6744897       5.294704
90%     1.281551       5.294704       Variance        .999999
95%     1.644853       5.294704       Skewness      -3.31e-06
99%     2.326348       5.294704       Kurtosis       2.999973



D H wrote:
Apparently, I need an infinite sample size to obtain a perfect
theoretic standard normal distribution with zero skew, variance
exactly 1 and kurtosis exactly 3.

Consider the following code:

version 8.2
set memory 400m
* 10,000,000 obs!:
set obs 10000000
generate double perfuniform = _n/(_N+1)
generate double perfnorm = invnorm(perfuniform)
summarize perfnorm, detail

The last command yields:

perfnorm
-------------------------------------------------------------
Percentiles Smallest
1% -2.326346 -5.199338
5% -1.644853 -5.068958
10% -1.281551 -4.991217 Obs 10000000
25% -.6744897 -4.935367 Sum of Wgt. 10000000

50% 6.96e-17 Mean 7.79e-18
Largest Std. Dev. .9999986
75% .6744897 4.935367
90% 1.281551 4.991217 Variance .9999971
95% 1.644853 5.068958 Skewness 3.23e-16
99% 2.326346 5.199338 Kurtosis 2.999924

Now the skew is basically zero, but the kurtosis and variance are not
precisely Gaussian. For 20,000 observations, the above would give
similar skew, kurtosis=2.987949 and variance = .9991686. (n=2000:
small skew, kurt=2.937, Var=.99386)

I see a number of possible approaches to this issue.

1) Include the numbers 1 and 0 in the uniform distribution. This
doesn't work: the invnorm function simply produces missing values for
those observations.

2) Take a harder look at my allegedly perfect uniform distribution.
Is this where I went wrong?

3) Try another statistics package?? (Somehow I doubt that this would help.)

4) We can always rescale the dispersion. Divide the resulting
distribution by the standard deviation. Consider this addition to the
10,000,000 obs example:

gen double perfnorm2=perfnorm/r(sd)
sum perfnorm2, detail

In that case, skew stays close to zero, var=1 and kurtosis is
unaffected - still less than 3.

5) Set up a loop and iteratively modify a couple of observations to
improve the kurtosis. Set the proper variance with 4). I
experimented with that: one problem is that this patch doesn't really
produce a perfect theoretic Gaussian distribution, but rather a
kludge.

6) Ignore the problem and characterize it as an oddity. Note though
that this may imply that normally distributed standard random
variables in Stata will not be perfectly standard normals *on
average*: even their variance will be a little lower without
adjustment. These effects would presumably be swamped by ordinary
sampling randomness.

More generally, I wonder whether this is the proper way to generate a
standard normal random variable in Stata:

gen double perfuniform = _n/(_N+1)
gen double perfnorm = invnorm(perfuniform)
sum perfnorm
scalar adjsd=r(sd)
gen double randnorm = invnorm(uniform())/adjsd

This procedure would not correct the kurtosis though.

Might this effect matter in small samples? Sure: for n=20, the
unadjusted variance of the theoretic standard normal would be .794,
which seems low.

But recall point 2): I'm probably missing something. For example, the
uniform random number generator would not produce the sort of evenly
spread-out numbers in my perfuniform variable. So this adjustment may
be inappropriate. Monte Carlo work might clarify matters, as would
additional conceptual insight. In short, I would avoid using this
adjustment without further analysis.

7) Or perhaps I *am* generating perfectly normal distributions, but
that theoretic normals only have kurtosis -> 3 as n -> infinity. I
don't know.

Thanks for your attention.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index