Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Can't generate perfect theoretic standard normal.

From   "D H" <>
Subject   st: Can't generate perfect theoretic standard normal.
Date   Wed, 9 May 2007 15:46:52 -0700

Apparently, I need an infinite sample size to obtain a perfect
theoretic standard normal distribution with zero skew, variance
exactly 1 and kurtosis exactly 3.

Consider the following code:

version 8.2
set memory 400m
* 10,000,000 obs!:
set obs 10000000
generate double perfuniform = _n/(_N+1)
generate double perfnorm = invnorm(perfuniform)
summarize perfnorm, detail

The last command yields:

     Percentiles      Smallest
1%    -2.326346      -5.199338
5%    -1.644853      -5.068958
10%    -1.281551      -4.991217       Obs            10000000
25%    -.6744897      -4.935367       Sum of Wgt.    10000000

50%     6.96e-17                      Mean           7.79e-18
                       Largest       Std. Dev.      .9999986
75%     .6744897       4.935367
90%     1.281551       4.991217       Variance       .9999971
95%     1.644853       5.068958       Skewness       3.23e-16
99%     2.326346       5.199338       Kurtosis       2.999924

Now the skew is basically zero, but the kurtosis and variance are not
precisely Gaussian.  For 20,000 observations, the above would give
similar skew, kurtosis=2.987949 and variance = .9991686.  (n=2000:
small skew, kurt=2.937, Var=.99386)

I see a number of possible approaches to this issue.

1)  Include the numbers 1 and 0 in the uniform distribution.  This
doesn't work: the invnorm function simply produces missing values for
those observations.

2)  Take a harder look at my allegedly perfect uniform distribution.
Is this where I went wrong?

3)  Try another statistics package??  (Somehow I doubt that this would help.)

4)  We can always rescale the dispersion.  Divide the resulting
distribution by the standard deviation.  Consider this addition to the
10,000,000 obs example:

gen double perfnorm2=perfnorm/r(sd)
sum perfnorm2, detail

In that case, skew stays close to zero, var=1 and kurtosis is
unaffected - still less than 3.

5)  Set up a loop and iteratively modify a couple of observations to
improve the kurtosis.  Set the proper variance with 4).  I
experimented with that: one problem is that this patch doesn't really
produce a perfect theoretic Gaussian distribution, but rather a

6)  Ignore the problem and characterize it as an oddity.  Note though
that this may imply that normally distributed standard random
variables in Stata will not be perfectly standard normals *on
average*: even their variance will be a little lower without
adjustment.  These effects would presumably be swamped by ordinary
sampling randomness.

More generally, I wonder whether this is the proper way to generate a
standard normal random variable in Stata:

gen double perfuniform = _n/(_N+1)
gen double perfnorm = invnorm(perfuniform)
sum perfnorm
scalar adjsd=r(sd)
gen double randnorm = invnorm(uniform())/adjsd

This procedure would not correct the kurtosis though.

Might this effect matter in small samples?  Sure: for n=20, the
unadjusted variance of the theoretic standard normal would be .794,
which seems low.

But recall point 2): I'm probably missing something.  For example, the
uniform random number generator would not produce the sort of evenly
spread-out numbers in my perfuniform variable.  So this adjustment may
be inappropriate.  Monte Carlo work might clarify matters, as would
additional conceptual insight.  In short, I would avoid using this
adjustment without further analysis.

7)  Or perhaps I *am* generating perfectly normal distributions, but
that theoretic normals only have kurtosis -> 3 as n -> infinity.  I
don't know.

Thanks for your attention.
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index