Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Joseph Coveney" <stajc2@gmail.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Questions for random data generation and value label |

Date |
Tue, 12 Mar 2013 10:28:27 +0900 |

Mark Yu Xue wrote: Let me use an example to describe my question more clearly. There is an actual data that has three variables: Var1, Var2, Var3. Each of them has continuous numeric values. And I get the max, min, SD, mean for each of them, and save them in several macros, and then clear the memory. Then, I want to generate a synthetic data, which also include three variables: SynVar1, SynVar2, SynVar3. And they keep the same max, min, SD, mean of Var1, Var2, Var3, respectively as in actual data. -------------------------------------------------------------------------------- If you have the actual data available, then you can try fitting a Johnson distribution to each variable (with one of the user-written commands -jnsn- or -jnsw-), and then generate the artificial dataset from the parameters of the Johnson distribution (using the user-written command -ajv-). All three user-written commands are in the same package, "JNSN", which you can download from SSC. Type -findit jnsn- to see more. These commands will not get you the exact-same mean, SD, minimum and maximum of the original variable each time, but Johnson distributions have been considered useful in creating artificial data following the same arbitrary (unknown) distribution of actual data of interest, for example, in order to characterize the behavior of candidate estimators or tests. The commands' help files might be a little busy-looking your first time through them, but the commands' use together is rather simple, with just two required lines of code: first either -jnsn- or -jnsw-, and then -ajv- using the returned scalars and macros of the first command. I've illustrated their use in a simple example below. Joseph Coveney . sysuse auto (1978 Automobile Data) . jnsn mpg Johnson's system of transformations Mean and moments for mpg Mean = 21.297 Variance = 33.472 Skewness = 0.949 Kurtosis = 3.975 Johnson distribution type: SB gamma = 2.248 delta = 1.541 xi = 9.616 lambda = 56.418 Note: Program terminated normally . return list scalars: r(lambda) = 56.41802121562024 r(xi) = 9.615504048256971 r(delta) = 1.54090335776377 r(gamma) = 2.247612125156365 macros: r(fault) : "Program terminated normally" r(johnson_type) : "SB" . ajv , distribution(`r(johnson_type)') generate(fake_mpg) lambda(`r(lambda)') xi(`r(xi)') gamma(`r(gamma)') delta(`r(delta)') seed(12345) n(100) . summarize mpg fake_mpg Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- mpg | 74 21.2973 5.785503 12 41 fake_mpg | 100 20.84794 5.561717 12.62255 37.59033 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Questions for random data generation and value label***From:*Yu Xue <snowrain@gmail.com>

**References**:**st: Questions for random data generation and value label***From:*Yu Xue <snowrain@gmail.com>

**Re: st: Questions for random data generation and value label***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: Questions for random data generation and value label***From:*Yu Xue <snowrain@gmail.com>

**Re: st: Questions for random data generation and value label***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: Questions for random data generation and value label***From:*Yu Xue <snowrain@gmail.com>

**Re: st: Questions for random data generation and value label***From:*Joerg Luedicke <joerg.luedicke@gmail.com>

**Re: st: Questions for random data generation and value label***From:*Yu Xue <snowrain@gmail.com>

- Prev by Date:
**st: question re model diagnostics for stcrreg from student** - Next by Date:
**Re: st: Problem with decile significance test** - Previous by thread:
**Re: st: Questions for random data generation and value label** - Next by thread:
**Re: st: Questions for random data generation and value label** - Index(es):