Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables


From   Richard Williams <[email protected]>
To   [email protected], [email protected]
Subject   Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables
Date   Wed, 31 Aug 2011 08:46:32 -0500

At 07:00 AM 8/31/2011, Tirthankar Chakravarty wrote:
This question has appeared a few times before - in that you want to
create a variable with a pattern of correlation with _existing_
variables, which -corr2data- does not do. In an example where means
are normalised to zero, this can be had by solving a system of linear
equations in appropriate expectations.

Suppose you generate a variable as

Z = a*X+ b*Y ---(0)

where a, and b are constants to be determined. Then you can derive the
following identities under the zero mean assumption:

Cov(Z, X) = a*Var(X) + b*Cov(X, Y)  ---(1)
Cov(Z, Y) = b*Var(Y) + a*Cov(X, Y)  ---(2)

Here you know everything (you set Cov(Z, X) and Cov(Z, Y)), and this
is a system of two equations in two unknowns, a and b. Solve them and
generate your variables as in equation (0).

So for example, if I have Cov(X, Y) = .6, and Var(X)=Var(Y)=1, then a
=0.15625 , b=0.40625.
/************************************/
mat mCov = (1, .6\ .6, 1)
// generate x and y
corr2data x y, cstorage(full) cov(mCov) n(100000) clear
// generate z based on current sample of x and y
g z = .15625*x+.40625*y
corr, covariance
/************************************/

I am going to tweak your example a bit. Instead of doing the algebra (and possibly screwing it up) let Stata do the work. Make mCov a combo of the correlations you observe in your data and the correlations you want for the new variable:

mat mCov = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1)
corr2data x y z, cstorage(full) cov(mCov) n(100000) clear
reg z x y

Here are the regression results:

. reg z x y

      Source |       SS       df       MS              Number of obs =  100000
-------------+------------------------------           F(  2, 99997) =18084.56
       Model |  26562.2344     2  13281.1172           Prob > F      =  0.0000
    Residual |  73436.7656 99997  .734389687           R-squared     =  0.2656
-------------+------------------------------           Adj R-squared =  0.2656
       Total |  99998.9999 99999  .999999999           Root MSE      =  .85697

------------------------------------------------------------------------------
           z |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |     .15625   .0033875    46.13   0.000     .1496106    .1628894
           y |     .40625   .0033875   119.93   0.000     .3996106    .4128894
       _cons |  -1.06e-08     .00271    -0.00   1.000    -.0053115    .0053115
------------------------------------------------------------------------------

You could now do something like

gen newvar = .15625*realx + .40625 * realy

You can easily make this more complicated, e.g. include the standard deviations and the means, add more Xs, etc. The -reg- command will do all the algebra for you.



-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index