Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Richard Williams <richardwilliams.ndu@gmail.com> |

To |
statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables |

Date |
Wed, 31 Aug 2011 08:46:32 -0500 |

At 07:00 AM 8/31/2011, Tirthankar Chakravarty wrote:

This question has appeared a few times before - in that you want to create a variable with a pattern of correlation with _existing_ variables, which -corr2data- does not do. In an example where means are normalised to zero, this can be had by solving a system of linear equations in appropriate expectations. Suppose you generate a variable as Z = a*X+ b*Y ---(0) where a, and b are constants to be determined. Then you can derive the following identities under the zero mean assumption: Cov(Z, X) = a*Var(X) + b*Cov(X, Y) ---(1) Cov(Z, Y) = b*Var(Y) + a*Cov(X, Y) ---(2) Here you know everything (you set Cov(Z, X) and Cov(Z, Y)), and this is a system of two equations in two unknowns, a and b. Solve them and generate your variables as in equation (0). So for example, if I have Cov(X, Y) = .6, and Var(X)=Var(Y)=1, then a =0.15625 , b=0.40625. /************************************/ mat mCov = (1, .6\ .6, 1) // generate x and y corr2data x y, cstorage(full) cov(mCov) n(100000) clear // generate z based on current sample of x and y g z = .15625*x+.40625*y corr, covariance /************************************/

mat mCov = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1) corr2data x y z, cstorage(full) cov(mCov) n(100000) clear reg z x y Here are the regression results: . reg z x y Source | SS df MS Number of obs = 100000 -------------+------------------------------ F( 2, 99997) =18084.56 Model | 26562.2344 2 13281.1172 Prob > F = 0.0000 Residual | 73436.7656 99997 .734389687 R-squared = 0.2656 -------------+------------------------------ Adj R-squared = 0.2656 Total | 99998.9999 99999 .999999999 Root MSE = .85697 ------------------------------------------------------------------------------ z | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .15625 .0033875 46.13 0.000 .1496106 .1628894 y | .40625 .0033875 119.93 0.000 .3996106 .4128894 _cons | -1.06e-08 .00271 -0.00 1.000 -.0053115 .0053115 ------------------------------------------------------------------------------ You could now do something like gen newvar = .15625*realx + .40625 * realy

------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: generating a variable with pre-specified correlations with other two (given) variables***From:*fjc <fjc120@gmail.com>

**st: RES: generating a variable with pre-specified correlations with other two (given) variables***From:*"Henrique Neder" <hdneder@ufu.br>

**Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables***From:*Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>

- Prev by Date:
**st: Question about pstest (after running psmatch2)** - Next by Date:
**Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables** - Previous by thread:
**Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables** - Next by thread:
**Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables** - Index(es):