Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: generating data sets with specific parameters


From   "" <>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: generating data sets with specific parameters
Date   Thu, 19 Aug 2004 23:01:06 -0400

Dear Statalist,

I must say that i only just now saw dr. williams response to my generating a
data set query just this moment, and only AFTER having sent a thank you to
the three previous graciously replying guys...so a special thank you to dr.
williams for his very helpful reply and guidance.


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Richard Williams
Sent: Thursday, August 19, 2004 11:14 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: RE: generating data sets with specific parameters

At 06:56 PM 8/18/2004 -0500, Scott Merryman wrote:
>You might try -corr2data-

-corr2data- is good if you want the parameters to come out EXACTLY as 
specified, e.g. if you say the mean is 10 and the SD is 2 then that is what 
they will indeed be in the data set that is created.  It is like creating a 
population with the specified parameters.  -drawnorm-, on the other, draws 
a random sample from a population with the specified parameters; hence, 
because of sampling variability, the numbers in the sample will differ a 
bit from the numbers you specify.

-corr2data- is especially good if, say, you have a published correlation 
matrix and want to replicate the results.  In SPSS, you would typically 
just input the means, correlations and SDs; in Stata you create a fake data 
set.  -drawnorm- is probably better if you are doing some sort of
simulation.

Note that -corr2data- always produces the exact same data set unless you 
use the -seed- parameter of -corr2data- (not the separate -seed- command.)

Key caution: If your goal is to use -corr2data- to analyze a published 
correlation matrix, remember, these are simulated data, not the 
original.  There is only so much you can do, e.g. you can't go computing 
interaction terms, take logs, analyze a subsample, or anything like 
that.  You can do stuff like run OLS regressions using different subsets of 
the variables.  The ONLINE help for -corr2data- goes over this a bit.

I discuss the use of -corr2data- on pages 7-9 of

http://www.nd.edu/~rwilliam/stats1/OLS-Stata.pdf


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX:    (574)288-4373
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu
WWW (personal):    http://www.nd.edu/~rwilliam
WWW (department):    http://www.nd.edu/~soc

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index