Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Questions for random data generation and value label


From   "Joseph Coveney" <stajc2@gmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Questions for random data generation and value label
Date   Tue, 12 Mar 2013 10:28:27 +0900

Mark Yu Xue wrote:

Let me use an example to describe my question more clearly.

There is an actual data that has three variables: Var1, Var2, Var3.
Each of them has continuous numeric values. And I get the max, min,
SD, mean for each of them, and save them in several macros, and then
clear the memory.

Then, I want to generate a synthetic data, which also include three
variables: SynVar1, SynVar2, SynVar3. And they keep the same max, min,
SD, mean  of Var1, Var2, Var3, respectively as in actual data.

--------------------------------------------------------------------------------

If you have the actual data available, then you can try fitting a Johnson
distribution to each variable (with one of the user-written commands -jnsn- or
-jnsw-), and then generate the artificial dataset from the parameters of the
Johnson distribution (using the user-written command -ajv-).  All three
user-written commands are in the same package, "JNSN", which you can download
from SSC.  Type -findit jnsn- to see more.  

These commands will not get you the exact-same mean, SD, minimum and maximum of
the original variable each time, but Johnson distributions have been considered
useful in creating artificial data following the same arbitrary (unknown)
distribution of actual data of interest, for example, in order to characterize
the behavior of candidate estimators or tests.

The commands' help files might be a little busy-looking your first time through
them, but the commands' use together is rather simple, with just two required
lines of code:  first either -jnsn- or -jnsw-, and then -ajv- using the returned
scalars and macros of the first command.  I've illustrated their use in a simple
example below.

Joseph Coveney

. sysuse auto
(1978 Automobile Data)

. jnsn mpg
Johnson's system of transformations


Mean and moments for mpg
    Mean = 21.297
Variance = 33.472
Skewness = 0.949
Kurtosis = 3.975


Johnson distribution type: SB
 gamma = 2.248
 delta = 1.541
    xi = 9.616
lambda = 56.418


Note: Program terminated normally

. return list

scalars:
             r(lambda) =  56.41802121562024
                 r(xi) =  9.615504048256971
              r(delta) =  1.54090335776377
              r(gamma) =  2.247612125156365

macros:
              r(fault) : "Program terminated normally"
       r(johnson_type) : "SB"

. ajv , distribution(`r(johnson_type)') generate(fake_mpg) lambda(`r(lambda)')
xi(`r(xi)') gamma(`r(gamma)') delta(`r(delta)') seed(12345) n(100)

. summarize mpg fake_mpg

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         mpg |        74     21.2973    5.785503         12         41
    fake_mpg |       100    20.84794    5.561717   12.62255   37.59033

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index