# Re: st: Can't generate perfect theoretic standard normal.

 From Jeph Herrin To statalist@hsphsun2.harvard.edu Subject Re: st: Can't generate perfect theoretic standard normal. Date Wed, 09 May 2007 19:18:52 -0400

```However, perfection => infinity is not surprising.

Jeph Herrin wrote:
```
```10x does not help much - the variance is closer to 1 but

set mem 1500m
set obs 100000000
* 100,000,000 obs!
generate perfuniform = _n/(_N+1)
generate perfnorm = invnorm(perfuniform)
summarize perfnorm, detail

. summarize perfnorm, detail

perfnorm
-------------------------------------------------------------
Percentiles      Smallest
1%    -2.326348      -5.612001
5%    -1.644854      -5.490852
10%    -1.281552      -5.418801       Obs            99999998
25%    -.6744897      -5.367129       Sum of Wgt.    99999998

50%    -3.74e-08                      Mean          -1.14e-07
Largest       Std. Dev.      .9999995
75%     .6744897       5.294704
90%     1.281551       5.294704       Variance        .999999
95%     1.644853       5.294704       Skewness      -3.31e-06
99%     2.326348       5.294704       Kurtosis       2.999973

D H wrote:
```
Apparently, I need an infinite sample size to obtain a perfect
theoretic standard normal distribution with zero skew, variance
exactly 1 and kurtosis exactly 3.

Consider the following code:

version 8.2
set memory 400m
* 10,000,000 obs!:
set obs 10000000
generate double perfuniform = _n/(_N+1)
generate double perfnorm = invnorm(perfuniform)
summarize perfnorm, detail

The last command yields:

perfnorm
-------------------------------------------------------------
Percentiles Smallest
1% -2.326346 -5.199338
5% -1.644853 -5.068958
10% -1.281551 -4.991217 Obs 10000000
25% -.6744897 -4.935367 Sum of Wgt. 10000000

50% 6.96e-17 Mean 7.79e-18
Largest Std. Dev. .9999986
75% .6744897 4.935367
90% 1.281551 4.991217 Variance .9999971
95% 1.644853 5.068958 Skewness 3.23e-16
99% 2.326346 5.199338 Kurtosis 2.999924

Now the skew is basically zero, but the kurtosis and variance are not
precisely Gaussian. For 20,000 observations, the above would give
similar skew, kurtosis=2.987949 and variance = .9991686. (n=2000:
small skew, kurt=2.937, Var=.99386)

I see a number of possible approaches to this issue.

1) Include the numbers 1 and 0 in the uniform distribution. This
doesn't work: the invnorm function simply produces missing values for
those observations.

2) Take a harder look at my allegedly perfect uniform distribution.
Is this where I went wrong?

3) Try another statistics package?? (Somehow I doubt that this would help.)

4) We can always rescale the dispersion. Divide the resulting
distribution by the standard deviation. Consider this addition to the
10,000,000 obs example:

gen double perfnorm2=perfnorm/r(sd)
sum perfnorm2, detail

In that case, skew stays close to zero, var=1 and kurtosis is
unaffected - still less than 3.

5) Set up a loop and iteratively modify a couple of observations to
improve the kurtosis. Set the proper variance with 4). I
experimented with that: one problem is that this patch doesn't really
produce a perfect theoretic Gaussian distribution, but rather a
kludge.

6) Ignore the problem and characterize it as an oddity. Note though
that this may imply that normally distributed standard random
variables in Stata will not be perfectly standard normals *on
average*: even their variance will be a little lower without
adjustment. These effects would presumably be swamped by ordinary
sampling randomness.

More generally, I wonder whether this is the proper way to generate a
standard normal random variable in Stata:

gen double perfuniform = _n/(_N+1)
gen double perfnorm = invnorm(perfuniform)
sum perfnorm

This procedure would not correct the kurtosis though.

Might this effect matter in small samples? Sure: for n=20, the
unadjusted variance of the theoretic standard normal would be .794,
which seems low.

But recall point 2): I'm probably missing something. For example, the
uniform random number generator would not produce the sort of evenly
be inappropriate. Monte Carlo work might clarify matters, as would
additional conceptual insight. In short, I would avoid using this

7) Or perhaps I *am* generating perfectly normal distributions, but
that theoretic normals only have kurtosis -> 3 as n -> infinity. I
don't know.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

```*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```