Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Questions for random data generation and value label


From   Yu Xue <[email protected]>
To   [email protected]
Subject   Re: st: Questions for random data generation and value label
Date   Tue, 12 Mar 2013 11:33:14 -0500

Thanks everyone for answering my question, especially Joseph! I think
you offered a right solution to my problem, although the result is
still not accurate.

Mark

On Mon, Mar 11, 2013 at 8:28 PM, Joseph Coveney <[email protected]> wrote:
> Mark Yu Xue wrote:
>
> Let me use an example to describe my question more clearly.
>
> There is an actual data that has three variables: Var1, Var2, Var3.
> Each of them has continuous numeric values. And I get the max, min,
> SD, mean for each of them, and save them in several macros, and then
> clear the memory.
>
> Then, I want to generate a synthetic data, which also include three
> variables: SynVar1, SynVar2, SynVar3. And they keep the same max, min,
> SD, mean  of Var1, Var2, Var3, respectively as in actual data.
>
> --------------------------------------------------------------------------------
>
> If you have the actual data available, then you can try fitting a Johnson
> distribution to each variable (with one of the user-written commands -jnsn- or
> -jnsw-), and then generate the artificial dataset from the parameters of the
> Johnson distribution (using the user-written command -ajv-).  All three
> user-written commands are in the same package, "JNSN", which you can download
> from SSC.  Type -findit jnsn- to see more.
>
> These commands will not get you the exact-same mean, SD, minimum and maximum of
> the original variable each time, but Johnson distributions have been considered
> useful in creating artificial data following the same arbitrary (unknown)
> distribution of actual data of interest, for example, in order to characterize
> the behavior of candidate estimators or tests.
>
> The commands' help files might be a little busy-looking your first time through
> them, but the commands' use together is rather simple, with just two required
> lines of code:  first either -jnsn- or -jnsw-, and then -ajv- using the returned
> scalars and macros of the first command.  I've illustrated their use in a simple
> example below.
>
> Joseph Coveney
>
> . sysuse auto
> (1978 Automobile Data)
>
> . jnsn mpg
> Johnson's system of transformations
>
>
> Mean and moments for mpg
>     Mean = 21.297
> Variance = 33.472
> Skewness = 0.949
> Kurtosis = 3.975
>
>
> Johnson distribution type: SB
>  gamma = 2.248
>  delta = 1.541
>     xi = 9.616
> lambda = 56.418
>
>
> Note: Program terminated normally
>
> . return list
>
> scalars:
>              r(lambda) =  56.41802121562024
>                  r(xi) =  9.615504048256971
>               r(delta) =  1.54090335776377
>               r(gamma) =  2.247612125156365
>
> macros:
>               r(fault) : "Program terminated normally"
>        r(johnson_type) : "SB"
>
> . ajv , distribution(`r(johnson_type)') generate(fake_mpg) lambda(`r(lambda)')
> xi(`r(xi)') gamma(`r(gamma)') delta(`r(delta)') seed(12345) n(100)
>
> . summarize mpg fake_mpg
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>          mpg |        74     21.2973    5.785503         12         41
>     fake_mpg |       100    20.84794    5.561717   12.62255   37.59033
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index