Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Questions for random data generation and value label


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Questions for random data generation and value label
Date   Mon, 11 Mar 2013 20:43:19 +0000

I think everyone who replied understood your wording.

The common element to our replies is that your wording is _not_ enough
to define a soluble problem.

It so happens that the uniform distribution can be defined by its
range, but the converse is not true. Just knowing the range of a
variable does not define it as uniform (it might be uniform or
triangular or beta, for example).

It so happens that the normal distribution can be defined by its mean
and SD, but the converse is not true. Just knowing the mean and SD of
a variable does not define it as normal.

Similarly, knowing the mean, SD and range of a distribution does not
define it as anything in particular. In fact, that would be an unusual
way to define a distribution, as there are two scale (dispersion,
spread) parameters.

If we're wrong, you need to show why.

In addition, you haven't addressed a question raised in replies. Does
your variable have finite support in principle?

Nick

On Mon, Mar 11, 2013 at 8:28 PM, Yu Xue <snowrain@gmail.com> wrote:
> Thanks Maarten, David, Nick, Joerg !
>
> Let me use an example to describe my question more clearly.
>
> There is an actual data that has three variables: Var1, Var2, Var3.
> Each of them has continuous numeric values. And I get the max, min,
> SD, mean for each of them, and save them in several macros, and then
> clear the memory.
>
> Then, I want to generate a synthetic data, which also include three
> variables: SynVar1, SynVar2, SynVar3. And they keep the same max, min,
> SD, mean  of Var1, Var2, Var3, respectively as in actual data.
>
> Hope I describe it clearly.
> Thank you very much
>
>
> On Mon, Mar 11, 2013 at 12:48 PM, Joerg Luedicke
> <joerg.luedicke@gmail.com> wrote:
>> The normal distribution has support -infinity,+infinity, so it is not
>> clear what you mean with 'range' here. Do you want to draw from a
>> truncated normal distribution?
>>
>> Joerg
>>
>> On Mon, Mar 11, 2013 at 12:49 PM, Yu Xue <snowrain@gmail.com> wrote:
>>> Thanks Maarten!
>>>
>>> What I want is Normal Distribution. Is there a way to randomly
>>> generate a variable with specific mean, SD, and range,
>>>
>>> Thanks!!
>>> Mark
>>>
>>> On Mon, Mar 11, 2013 at 10:35 AM, Maarten Buis <maartenlbuis@gmail.com> wrote:
>>>> On Mon, Mar 11, 2013 at 4:20 PM, Yu Xue wrote:
>>>>> I already checked "-help random_number_functions-", but I still can
>>>>> not find the answer to my question.
>>>>>
>>>>> I knew that I can use a formula similar like this:
>>>>> Var=a+int((b-a+1)*runiform()), to keep a specific range in [a,b]
>>>>> and use another formula: Var=invnorm(uniform())*SD+mean, to keep
>>>>> specific Standard deviation and mean.
>>>>> But I do not know how to generate a "Var" with all specific range, SD, and mean.
>>>>> Please note that I do not generate a sample data from the actual data,
>>>>> what I want to generate is synthetic data (totally fake data).
>>>>
>>>> What distribution do you want to draw your new variable from? Do you
>>>> want it to be normally (Gaussian) distributed, gamma distributed, beta
>>>> distribed, Fisk distributed, Laplace distributed, ... The number of
>>>> choices is huge, but without choosing your distribution you cannot
>>>> draw your random numbers.
>>>>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index