Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables


From   Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables
Date   Wed, 31 Aug 2011 05:47:30 -0700

Throw in some orthogonal, zero mean noise when constructing Z:

g z = .15625*x+.40625*y + runiform()

T

On Wed, Aug 31, 2011 at 5:41 AM, fjc <fjc120@gmail.com> wrote:
> Thanks, Tirthankar.
>
> This answers my question as originally posted.
>
> Now, something I didn't say in my earlier post (and I think I should
> have) is that after I generate the new variable (z) I would like to
> run a regression of y on x and z. But if I generate z in the way you
> propose, I will get perfect collinearity. ¿Is there any other way to
> generate z without getting this collinearity?
>
> Francisco.
>
> P.D. The reason I want to run the aforementioned regression is the
> following. Suppose I have an initial regression of y on x, and x turns
> out to be insignificantly different from zero at some chosen
> confidence level. Then I want to generate an example in which adding a
> new (artificial) variable z as a covariate I can get x to become
> significantly different from zero at the same confidence level. Based
> on the formula for the t-test, I think I can do this if I can control
> the correlations between the artificial variable and the original
> ones. The excercise is just for expositional purposes, I do not want
> to attach any deep meaning to it.
>
>
> On Wed, Aug 31, 2011 at 9:00 AM, Tirthankar Chakravarty
> <tirthankar.chakravarty@gmail.com> wrote:
>> This question has appeared a few times before - in that you want to
>> create a variable with a pattern of correlation with _existing_
>> variables, which -corr2data- does not do. In an example where means
>> are normalised to zero, this can be had by solving a system of linear
>> equations in appropriate expectations.
>>
>> Suppose you generate a variable as
>>
>> Z = a*X+ b*Y ---(0)
>>
>> where a, and b are constants to be determined. Then you can derive the
>> following identities under the zero mean assumption:
>>
>> Cov(Z, X) = a*Var(X) + b*Cov(X, Y)  ---(1)
>> Cov(Z, Y) = b*Var(Y) + a*Cov(X, Y)  ---(2)
>>
>> Here you know everything (you set Cov(Z, X) and Cov(Z, Y)), and this
>> is a system of two equations in two unknowns, a and b. Solve them and
>> generate your variables as in equation (0).
>>
>> So for example, if I have Cov(X, Y) = .6, and Var(X)=Var(Y)=1, then a
>> =0.15625 , b=0.40625.
>> /************************************/
>> mat mCov = (1, .6\ .6, 1)
>> // generate x and y
>> corr2data x y, cstorage(full) cov(mCov) n(100000) clear
>> // generate z based on current sample of x and y
>> g z = .15625*x+.40625*y
>> corr, covariance
>> /************************************/
>>
>> All these calculations are assuming zero means - more tedious algebra
>> will allow you to generalise.
>>
>> T
>>
>> On Wed, Aug 31, 2011 at 3:53 AM, Henrique Neder <hdneder@ufu.br> wrote:
>>> Try corr2data:
>>>
>>> matrix C = (1,0,.80,-.80\0,1,0,0\.80,0,1,-.80\-.80,0,-.80,1)
>>> corr2data hsperc corzer1 corpos1 corneg1, n(4137) corr(C)
>>>
>>> Henrique Neder
>>>
>>>
>>> -----Mensagem original-----
>>> De: owner-statalist@hsphsun2.harvard.edu
>>> [mailto:owner-statalist@hsphsun2.harvard.edu] Em nome de fjc
>>> Enviada em: terça-feira, 30 de agosto de 2011 23:00
>>> Para: statalist@hsphsun2.harvard.edu
>>> Assunto: st: generating a variable with pre-specified correlations with
>>> other two (given) variables
>>>
>>> Dear Statalisters:
>>>
>>> I have a dataset with two variables, x and y.
>>>
>>> I would like to generate a new artificial variable, z, with
>>> pre-specified correlations with x and y (no particular distribution
>>> required).
>>>
>>> Any help would be greatly appreciated.
>>>
>>> Best,
>>>
>>> Francisco.
>>>
>>> P.D. I'm using Stata 11 (on Windows XP)
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> -----
>>> Nenhum vírus encontrado nessa mensagem.
>>> Verificado por AVG - www.avgbrasil.com.br
>>> Versão: 10.0.1392 / Banco de dados de vírus: 1520/3868 - Data de Lançamento:
>>> 08/30/11
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>>
>>
>> --
>> Tirthankar Chakravarty
>> tchakravarty@ucsd.edu
>> tirthankar.chakravarty@gmail.com
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Tirthankar Chakravarty
tchakravarty@ucsd.edu
tirthankar.chakravarty@gmail.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index