Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Simulate and corr2data


From   Richard Williams <[email protected]>
To   [email protected]
Subject   Re: st: Simulate and corr2data
Date   Tue, 20 Jan 2004 19:27:09 -0500

At 09:48 PM 1/20/2004 +0000, Allan Reese wrote:
Followed example for simulate except to use corr2data rather than gen to
create the dataset.  Imagine my chagrin when each repetition gave the same
answer!  So tried corr2data from the command prompt and found it gives the
same sample time after time, regardless of any setting of seed.  Does
anyone have a fix please?  Is this an unintended bug?  Looks counter
intuitive and ought to happen only if "set seed" used to restart the
sequence.
Given corr2data's intended purpose, I don't think this is really a bug. corr2data is meant to generate data where only the means, correlations, sds and N are required for the analysis -- if any other feature of the data is required, corr2data will not handle it. Hence, it doesn't matter what the data is, so long as it produces the desired correlations, etc. As the online docs say,

"corr2data is designed to enable analyses of correlation (covariance) matrices by commands that expect variables rather than a correlation (covariance) matrix. corr2data creates variables with exactly the correlation (covariance) that you want to analyze. Apart from means and covariances, all aspects of the data are meaningless. Only analyses that depend on the correlations (covariances) and means produce meaningful results. Thus, you may perform a principal components analysis pca, a factor analysis factor, or an ordinary regression analysis regress, etc.

"If you are not sure that a statistical result only depends on the specified matrices and not on meaningless aspects of the representation of the data, you can generate different datasets, each having the same correlation (covariance) matrix and means, by specifying different seed() values. If the statistical result differs beyond what is attributable to roundoff error, the statistical result is meaningless."

corr2data is great if, say, you want to replicate a published regression analysis where the means, sds and correlations are in the paper. It is pretty obvious that the data are fake though, when you see that something that was a dummy variable in the original data takes on values like .432987 in your corr2data pseudo-replication. In SPSS, you achieve the same goal via various matrix commands where you just input the correlations, etc.; in Stata you create fake data with the desired structure. I discuss this on pp. 7-9 of this handout:

http://www.nd.edu/~rwilliam/xsoc593/lectures/OLS-Stata.pdf



If corr2data can not produce independent samples, can I have comments
please on whather it is equivalent to use bootstrap with a large N and
relatively small sample size.  I am interested in sample statistics that
depend on the overall distribution of points, and the default use of
bootstrap to select multiple samples of size N from N points must rely
upon random points being used once, twice or more - or have I completely
misunderstood?
Again, not clear on the goal -- but if corr2data with the seed option does not give you what you want (and I suspect it doesn't) you could create a large data set or data sets with corr2data and then draw random samples from them. That might be easier than figuring out the gen commands that would be required to create a data set drawn from a population with certain desired characteristics.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: [email protected]
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index