Yu Xue <snowrain@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: Questions for random data generation and value label

Tue, 12 Mar 2013 11:33:14 -0500

Thanks everyone for answering my question, especially Joseph! I think you offered a right solution to my problem, although the result is still not accurate. Mark On Mon, Mar 11, 2013 at 8:28 PM, Joseph Coveney <stajc2@gmail.com> wrote: > Mark Yu Xue wrote: > > Let me use an example to describe my question more clearly. > > There is an actual data that has three variables: Var1, Var2, Var3. > Each of them has continuous numeric values. And I get the max, min, > SD, mean for each of them, and save them in several macros, and then > clear the memory. > > Then, I want to generate a synthetic data, which also include three > variables: SynVar1, SynVar2, SynVar3. And they keep the same max, min, > SD, mean of Var1, Var2, Var3, respectively as in actual data. > > -------------------------------------------------------------------------------- > > If you have the actual data available, then you can try fitting a Johnson > distribution to each variable (with one of the user-written commands -jnsn- or > -jnsw-), and then generate the artificial dataset from the parameters of the > Johnson distribution (using the user-written command -ajv-). All three > user-written commands are in the same package, "JNSN", which you can download > from SSC. Type -findit jnsn- to see more. > > These commands will not get you the exact-same mean, SD, minimum and maximum of > the original variable each time, but Johnson distributions have been considered > useful in creating artificial data following the same arbitrary (unknown) > distribution of actual data of interest, for example, in order to characterize > the behavior of candidate estimators or tests. > > The commands' help files might be a little busy-looking your first time through > them, but the commands' use together is rather simple, with just two required > lines of code: first either -jnsn- or -jnsw-, and then -ajv- using the returned > scalars and macros of the first command. I've illustrated their use in a simple > example below. > > Joseph Coveney > > . sysuse auto > (1978 Automobile Data) > > . jnsn mpg > Johnson's system of transformations > > > Mean and moments for mpg > Mean = 21.297 > Variance = 33.472 > Skewness = 0.949 > Kurtosis = 3.975 > > > Johnson distribution type: SB > gamma = 2.248 > delta = 1.541 > xi = 9.616 > lambda = 56.418 > > > Note: Program terminated normally > > . return list > > scalars: > r(lambda) = 56.41802121562024 > r(xi) = 9.615504048256971 > r(delta) = 1.54090335776377 > r(gamma) = 2.247612125156365 > > macros: > r(fault) : "Program terminated normally" > r(johnson_type) : "SB" > > . ajv , distribution(`r(johnson_type)') generate(fake_mpg) lambda(`r(lambda)') > xi(`r(xi)') gamma(`r(gamma)') delta(`r(delta)') seed(12345) n(100) > > . summarize mpg fake_mpg > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > mpg | 74 21.2973 5.785503 12 41 > fake_mpg | 100 20.84794 5.561717 12.62255 37.59033 > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

