Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables


From   Richard Williams <richardwilliams.ndu@gmail.com>
To   statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu
Subject   Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables
Date   Wed, 31 Aug 2011 19:23:10 -0500

fjc, I'll trust your math, but it seems awfully complicated to me. Why don't you just want to do something like

mat mCorr = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1)
corr2data x y z, cstorage(full) corr(mCorr) n(100000) clear
corr
reg z x y

You can get the correlations (and also means & sds) for the real x and y from your data set and just plug them in, and then plug in the desired correlations for x and y with z.

As a sidelight, any random variable u you generate yourself will have a nonzero (albeit small) correlation with x and y, because of sampling variability. (Unless you generate u as part of the corr2data command, in which case you can force it to have 0 correlation with x and y.)

At 03:23 PM 8/31/2011, fjc wrote:
Hi,

Thank you all for the quick and useful responses.

1. I can do with covariances instead of correlations, so the methods
proposed by Tirthankar and Richard work fine.

2. Still, if I wanted to stick to correlations, I think one can apply
the same ideas (as suggested in the previous responses):

Let z be given by

(0) z = a * x + b * y + c * u,

where x and y are the two variables in the dataset and u is a
zero-mean random variable independent of x and y.

From (0) one gets:

(1)  Corr(x,z) = a * sd(x)/sd(z) + b * sd(y)/sd(z) * Corr(x,y)

(2)  Corr(y,z) = b * sd(y)/sd(z) + a * sd(x)/sd(z) * Corr(x,y)

(3) Var(z) = a^2 * Var(x) + b^2 * Var(y) + c^2 * Var(u) + 2 * a * b * Cov(x,y)

Once we have chosen Corr(x,z), Corr(y,z) and Var(z), we can solve the
system above for a, b, and c. Actually, equations (1) and (2) can be
solved for a and b to get:

a = [sd(z)/sd(x)] * [Corr(x,z) - Corr(x,y)*Corr(y,z)] / (1 - Corr(x,y)^2)

b = [sd(z)/sd(y)] * [Corr(y,z) - Corr(x,y)*Corr(x,z)] / (1 - Corr(x,y)^2)

Then we can use (3) to obtain the value of c.

Finally, we can use (0) to generate z.

Thanks again,

Francisco.


On Wed, Aug 31, 2011 at 3:59 PM, Richard Williams
<richardwilliams.ndu@gmail.com> wrote:
> At 07:41 AM 8/31/2011, fjc wrote:
>>
>> Thanks, Tirthankar.
>>
>> This answers my question as originally posted.
>>
>> Now, something I didn't say in my earlier post (and I think I should
>> have) is that after I generate the new variable (z) I would like tow
>> run a regression of y on x and z. But if I generate z in the way you
>> propose, I will get perfect collinearity. żIs there any other way to
>> generate z without getting this collinearity?
>
> Slightly tweaking the earlier example, does this do what you want?
>
> mat mCorr = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1)
> corr2data x y z, cstorage(full) corr(mCorr) n(100000) clear
> corr
> reg z x y
>
> Again, mCorr is a combo of the given correlations for x and y with the
> desired correlations for z. If you want, you can also specify standard
> deviations and means, both observed (for x and y) and desired (for z). I am
> faking all the data, although the correlations etc. can come from real data.
> If you want to do some combo of fake and real (e.g. generate a z using the
> realx and realy) it can probably be done but would take a bit more work.
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME: Â  (574)289-5227
> EMAIL: Â Richard.A.Williams.5@ND.Edu
> WWW: Â  Â http://www.nd.edu/~rwilliam
>
>
> *
> * Â  For searches and help try:
> * Â  http://www.stata.com/help.cgi?search
> * Â  http://www.stata.com/support/statalist/faq
> * Â  http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu
WWW:    http://www.nd.edu/~rwilliam


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index