Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Richard Williams <richardwilliams.ndu@gmail.com> |
To | statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu |
Subject | Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables |
Date | Wed, 31 Aug 2011 19:23:10 -0500 |
mat mCorr = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1) corr2data x y z, cstorage(full) corr(mCorr) n(100000) clear corr reg z x yYou can get the correlations (and also means & sds) for the real x and y from your data set and just plug them in, and then plug in the desired correlations for x and y with z.
As a sidelight, any random variable u you generate yourself will have a nonzero (albeit small) correlation with x and y, because of sampling variability. (Unless you generate u as part of the corr2data command, in which case you can force it to have 0 correlation with x and y.)
At 03:23 PM 8/31/2011, fjc wrote:
Hi, Thank you all for the quick and useful responses. 1. I can do with covariances instead of correlations, so the methods proposed by Tirthankar and Richard work fine. 2. Still, if I wanted to stick to correlations, I think one can apply the same ideas (as suggested in the previous responses): Let z be given by (0) z = a * x + b * y + c * u, where x and y are the two variables in the dataset and u is a zero-mean random variable independent of x and y. From (0) one gets: (1) Corr(x,z) = a * sd(x)/sd(z) + b * sd(y)/sd(z) * Corr(x,y) (2) Corr(y,z) = b * sd(y)/sd(z) + a * sd(x)/sd(z) * Corr(x,y)(3) Var(z) = a^2 * Var(x) + b^2 * Var(y) + c^2 * Var(u) + 2 * a * b * Cov(x,y)Once we have chosen Corr(x,z), Corr(y,z) and Var(z), we can solve the system above for a, b, and c. Actually, equations (1) and (2) can be solved for a and b to get: a = [sd(z)/sd(x)] * [Corr(x,z) - Corr(x,y)*Corr(y,z)] / (1 - Corr(x,y)^2) b = [sd(z)/sd(y)] * [Corr(y,z) - Corr(x,y)*Corr(x,z)] / (1 - Corr(x,y)^2) Then we can use (3) to obtain the value of c. Finally, we can use (0) to generate z. Thanks again, Francisco. On Wed, Aug 31, 2011 at 3:59 PM, Richard Williams <richardwilliams.ndu@gmail.com> wrote: > At 07:41 AM 8/31/2011, fjc wrote: >> >> Thanks, Tirthankar. >> >> This answers my question as originally posted. >> >> Now, something I didn't say in my earlier post (and I think I should >> have) is that after I generate the new variable (z) I would like tow >> run a regression of y on x and z. But if I generate z in the way you >> propose, I will get perfect collinearity. żIs there any other way to >> generate z without getting this collinearity? > > Slightly tweaking the earlier example, does this do what you want? > > mat mCorr = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1) > corr2data x y z, cstorage(full) corr(mCorr) n(100000) clear > corr > reg z x y > > Again, mCorr is a combo of the given correlations for x and y with the > desired correlations for z. If you want, you can also specify standard > deviations and means, both observed (for x and y) and desired (for z). I am> faking all the data, although the correlations etc. can come from real data.> If you want to do some combo of fake and real (e.g. generate a z using the > realx and realy) it can probably be done but would take a bit more work. > > > ------------------------------------------- > Richard Williams, Notre Dame Dept of Sociology > OFFICE: (574)631-6668, (574)631-6463 > HOME:  (574)289-5227 > EMAIL:  Richard.A.Williams.5@ND.Edu > WWW:   http://www.nd.edu/~rwilliam > > > * > *  For searches and help try: > *  http://www.stata.com/help.cgi?search > *  http://www.stata.com/support/statalist/faq > *  http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/