Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Simulate and corr2data - solutions & comment


From   Allan Reese <[email protected]>
To   Stata distribution list <[email protected]>
Subject   Re: st: Simulate and corr2data - solutions & comment
Date   Wed, 21 Jan 2004 11:22:57 +0000 (GMT)

Richard.A.Williams.5 wrote on Wed Jan 21 10:40:43:

> There is a seed option in corr2data, e.g.
> corr2data x y, n(2000) corr(C) seed(123)

> It isn't in the manual, so I assume it is a relatively recent addition;
> it shows up in help.

It is indeed recent ;-)  I was working with Stata 8.2 updated to 15 Dec
2003 when I reported the problem.  The change and revised documentation is
in the 6 Jan 2004 ado update.

> Given corr2data's intended purpose, I don't think this [giving the same
> results every time]  is really a bug.  corr2data is meant to generate
> data where only the means, correlations, sds and N are required for the
> analysis.

As it is documented, it is a feature.  But I suspect a misapplied logic
here.  corr2data creates a pair of variables with given mean and
covariances.  But any use of that sample must look at other features of
the sample; if you "only need the means ..." then you do not need the
observations.

Richard quotes the new help text

"If you are not sure that a statistical result only depends on the
specified matrices and not on meaningless aspects of the representation of
the data, you can generate different datasets, each having the same
correlation (covariance) matrix and means, by specifying different seed()
values.  If the statistical result differs beyond what is attributable to
roundoff error, the statistical result is meaningless."

Apart from the problem of parsing the English, I can't make head or tail
of this advice.  At the core seems to be a reminder that the mathematical
binormal distribution is characterized completely by the first two
moments; beyond that I don't know what "meaningless aspects of the
representation of the data" might be.

> corr2data is great if, say, you want to replicate a published regression
> analysis where the means, sds and correlations are in the paper.

English!  You ain't *replicating* the analysis, but simulating it.  You
replicate the method to understand the process and perhaps investigate its
robustness and sensitivity - for which purpose you require repeated
different samples.


>>If corr2data can not produce independent samples, can I have comments
...
> Again, not clear on the goal -- but if corr2data with the seed option
> does not give you what you want (and I suspect it doesn't) you could
> create a large data set or data sets with corr2data and then draw
> random samples from them.

An excellent suggestion, and this is exactly what in practice I did.  Use
corr2data once to simulate 10000 points and save that file.  Then the
program myprog to run under simulate contains:

use "10000pts"
gen rand = uniform()
sort rand
keep in 1/`n'

where n is an option in the call, eg,

simulate "myprog , n(50)" ...

If you want to include corr2data in a simulation loop, it appears that you
would need to save the current seed in a global macro inside "myprog".
The following does NOT work (today!), again suggesting a break in
StataCorp's usual joined-up thinking.

corr2data x y, n(2000) corr(C) seed($lastseed)
global lastseed = c(seed)

I remain unconvinced that the default action of producing the *same*
arbitrary data sample whenever corr2data is called is sensible or helpful.
The usual practice with any random process is to generate (pseudo-)
independent samples, unless the user puts in a fixed seed.


R. Allan Reese                       Email:     [email protected]
Associate Manager GRI                Direct voice:   +44 1482 466845
Graduate School                      Voice messages: +44 1482 466844
Hull University, Hull HU6 7RX, UK.   Fax:            +44 1482 466436
====================================================================
Be careful in handling the battery of the remote control transmitter
If swallowed consult a physician immediately for emergency treatment
               [Safety instructions: Hitachi CP-X275 data projector]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index