Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Questions for random data generation and value label

From   Maarten Buis <>
Subject   Re: st: Questions for random data generation and value label
Date   Thu, 14 Mar 2013 10:26:10 +0100

On Thu, Mar 14, 2013 at 12:50 AM, Yu Xue wrote:
> which shows how to generate random data with some specific parameters
> without mentioning the type of distribution.

You misunderstood that website: It does specify the distribution from
which it draws the random numbers. In the first example it draws the
random variables from a uniform distribution and in the second example
from a normal distribution.So first you need to tell us which
distribution you want to draw from. The mean, standard deviation, min
and max is not sufficient to define a distribution, see the discussion
and example below. So the answer you need to give us is either
"normal", or "uniform", or "gamma", or "Laplace", or "beta", or ...

> If I have to specify the type of distribution in order for you to
> answer my question, I will specify a normal distribution.

As we have explained before, you cannot have a normal distribution and
specify bounds. By definition, the normal distribution is a
distribution for variables that can take values between -infinity to
+infinity. The way you write, it seems like you just picked one
distribution that sounded familiar. That is not a good criterion. You
really need to consider what you want to use your random draws for and
what their properties should be.

> Min in "seq_num" and "seq_num1" are very different, which is what I
> called "not accurate" before.

We have said before, if you want a strick adherence to the minimum and
maximum you could consider drawing from a beta distribution. However,
the fact that the mean, standard deviation, min and max correspond to
the values you specify is not enough to guarantee that it is
appropriate, as can be seen in the example below. The example requires
the -qplot- package, which you can find and install using -findit

*------------------ begin example ------------------
sysuse auto, clear

sum price
tempname m sd min max
scalar `m'   = ( r(mean) - r(min) ) / ( r(max) - r(min) )
scalar `sd'  =  r(sd) / ( r(max) - r(min) )
scalar `min' = r(min)
scalar `max' = r(max)

tempname alpha beta
scalar `alpha' = `m'*((`m'*(1-`m'))/(`sd'^2)-1)
scalar `beta' = (1-`m')*((`m'*(1-`m'))/(`sd'^2)-1)

forvalues i = 1/19 {
	gen sim`i' = rbeta(`alpha', `beta')*(`max' - `min') + `min'

// mean and standard deviatins differ as much as one would
// expect with random draws and the min and max is strictly
// maintained
sum price sim*

// still the distribution of the simulated variables differ
// considerably from the distribution of price
qplot price sim*, ///
   trscale(invibeta(`alpha',`beta',@)*(`max' - `min') + `min') ///
   ms(oh none ..) c(. l ..) lc(gs10 ..) legend(off)
*------------------- end example -------------------
(For more on examples I sent to the Statalist see: )

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index