Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Can't generate perfect theoretic standard normal.


From   Jeph Herrin <junk@spandrel.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Can't generate perfect theoretic standard normal.
Date   Thu, 10 May 2007 09:30:20 -0400

Before I forgot to use doubles. Just for fun, how
about 50x?

 clear
 set mem 15000m          /* 15gb */
 set obs 500000000
 * 500,000,000 obs!
 generate double perfuniform = _n/(_N+1)
 generate double perfnorm = invnorm(perfuniform)
 drop perfuniform        /* make room for -summarize- */
 summarize perfnorm, detail


                          perfnorm
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -2.326348      -5.884193
 5%    -1.644854      -5.768458
10%    -1.281552      -5.699726       Obs           500000000
25%    -.6744897       -5.65048       Sum of Wgt.   500000000

50%            0                      Mean           1.07e-17
                        Largest       Std. Dev.             1
75%     .6744897        5.65048
90%     1.281552       5.699726       Variance       .9999999
95%     1.644854       5.768458       Skewness       7.54e-16
99%     2.326348       5.884193       Kurtosis       2.999998





D H wrote:
Apparently, I need an infinite sample size to obtain a perfect
theoretic standard normal distribution with zero skew, variance
exactly 1 and kurtosis exactly 3.

Consider the following code:

version 8.2
set memory 400m
* 10,000,000 obs!:
set obs 10000000
generate double perfuniform = _n/(_N+1)
generate double perfnorm = invnorm(perfuniform)
summarize perfnorm, detail

The last command yields:

perfnorm
-------------------------------------------------------------
Percentiles Smallest
1% -2.326346 -5.199338
5% -1.644853 -5.068958
10% -1.281551 -4.991217 Obs 10000000
25% -.6744897 -4.935367 Sum of Wgt. 10000000

50% 6.96e-17 Mean 7.79e-18
Largest Std. Dev. .9999986
75% .6744897 4.935367
90% 1.281551 4.991217 Variance .9999971
95% 1.644853 5.068958 Skewness 3.23e-16
99% 2.326346 5.199338 Kurtosis 2.999924

Now the skew is basically zero, but the kurtosis and variance are not
precisely Gaussian. For 20,000 observations, the above would give
similar skew, kurtosis=2.987949 and variance = .9991686. (n=2000:
small skew, kurt=2.937, Var=.99386)

I see a number of possible approaches to this issue.

1) Include the numbers 1 and 0 in the uniform distribution. This
doesn't work: the invnorm function simply produces missing values for
those observations.

2) Take a harder look at my allegedly perfect uniform distribution.
Is this where I went wrong?

3) Try another statistics package?? (Somehow I doubt that this would help.)

4) We can always rescale the dispersion. Divide the resulting
distribution by the standard deviation. Consider this addition to the
10,000,000 obs example:

gen double perfnorm2=perfnorm/r(sd)
sum perfnorm2, detail

In that case, skew stays close to zero, var=1 and kurtosis is
unaffected - still less than 3.

5) Set up a loop and iteratively modify a couple of observations to
improve the kurtosis. Set the proper variance with 4). I
experimented with that: one problem is that this patch doesn't really
produce a perfect theoretic Gaussian distribution, but rather a
kludge.

6) Ignore the problem and characterize it as an oddity. Note though
that this may imply that normally distributed standard random
variables in Stata will not be perfectly standard normals *on
average*: even their variance will be a little lower without
adjustment. These effects would presumably be swamped by ordinary
sampling randomness.

More generally, I wonder whether this is the proper way to generate a
standard normal random variable in Stata:

gen double perfuniform = _n/(_N+1)
gen double perfnorm = invnorm(perfuniform)
sum perfnorm
scalar adjsd=r(sd)
gen double randnorm = invnorm(uniform())/adjsd

This procedure would not correct the kurtosis though.

Might this effect matter in small samples? Sure: for n=20, the
unadjusted variance of the theoretic standard normal would be .794,
which seems low.

But recall point 2): I'm probably missing something. For example, the
uniform random number generator would not produce the sort of evenly
spread-out numbers in my perfuniform variable. So this adjustment may
be inappropriate. Monte Carlo work might clarify matters, as would
additional conceptual insight. In short, I would avoid using this
adjustment without further analysis.

7) Or perhaps I *am* generating perfectly normal distributions, but
that theoretic normals only have kurtosis -> 3 as n -> infinity. I
don't know.

Thanks for your attention.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index