Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables

 From Richard Williams To statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu Subject Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables Date Wed, 31 Aug 2011 08:46:32 -0500

```At 07:00 AM 8/31/2011, Tirthankar Chakravarty wrote:
```
```This question has appeared a few times before - in that you want to
create a variable with a pattern of correlation with _existing_
variables, which -corr2data- does not do. In an example where means
are normalised to zero, this can be had by solving a system of linear
equations in appropriate expectations.

Suppose you generate a variable as

Z = a*X+ b*Y ---(0)

where a, and b are constants to be determined. Then you can derive the
following identities under the zero mean assumption:

Cov(Z, X) = a*Var(X) + b*Cov(X, Y)  ---(1)
Cov(Z, Y) = b*Var(Y) + a*Cov(X, Y)  ---(2)

Here you know everything (you set Cov(Z, X) and Cov(Z, Y)), and this
is a system of two equations in two unknowns, a and b. Solve them and
generate your variables as in equation (0).

So for example, if I have Cov(X, Y) = .6, and Var(X)=Var(Y)=1, then a
=0.15625 , b=0.40625.
/************************************/
mat mCov = (1, .6\ .6, 1)
// generate x and y
corr2data x y, cstorage(full) cov(mCov) n(100000) clear
// generate z based on current sample of x and y
g z = .15625*x+.40625*y
corr, covariance
/************************************/
```
```
```
I am going to tweak your example a bit. Instead of doing the algebra (and possibly screwing it up) let Stata do the work. Make mCov a combo of the correlations you observe in your data and the correlations you want for the new variable:
```
mat mCov = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1)
corr2data x y z, cstorage(full) cov(mCov) n(100000) clear
reg z x y

Here are the regression results:

. reg z x y

Source |       SS       df       MS              Number of obs =  100000
-------------+------------------------------           F(  2, 99997) =18084.56
Model |  26562.2344     2  13281.1172           Prob > F      =  0.0000
Residual |  73436.7656 99997  .734389687           R-squared     =  0.2656
Total |  99998.9999 99999  .999999999           Root MSE      =  .85697

------------------------------------------------------------------------------
z |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x |     .15625   .0033875    46.13   0.000     .1496106    .1628894
y |     .40625   .0033875   119.93   0.000     .3996106    .4128894
_cons |  -1.06e-08     .00271    -0.00   1.000    -.0053115    .0053115
------------------------------------------------------------------------------

You could now do something like

gen newvar = .15625*realx + .40625 * realy

```
You can easily make this more complicated, e.g. include the standard deviations and the means, add more Xs, etc. The -reg- command will do all the algebra for you.
```

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```