German--
Wonderful explanation! I will study again at greater length, but
am relieved as well as informed...
Best,
--Herb
Professor of Sociology and
Director, Population Studies Center
230 McNeil Building
3718 Locust Walk CR
University of Pennsylvania
Philadelphia, PA 19104-6298
hsmith@pop.upenn.edu
215.898.7768 (office)
215.898.2124 (fax)
On Fri, 20 Oct 2006, German Rodriguez wrote:
> Herb,
>
> The short answer is that there's nothing wrong with your code, and the
> regression coefficients need just the right standardization to evolve into
> partial correlations.
>
> Let (y,x,z) ~ MVN(m,V). Partition m=(m1\m2) and V=(V11, V12 \ V21, V22) so y
> has mean m1 and variance V11 and the column vector x\z has mean m2 and
> variance V22.
>
> Then the conditional distribution of y|x\z is MVN with mean m1 - V12 V22^-1
> (x\z-m2) and variance V11 - V12 V22^-1 V21 [nicer-looking formulas in
> Wikipidea, see link below].
>
> We can do these calculations in Mata. In your example the unconditional
> means are all zero so we work just with V
>
> : V = (1, .2, .5 \ .2, 1, .2 \ .5, .2, 1)
>
> : b = V[1,(2,3)] * invsym( V[(2\3),(2,3)] )
>
> : b
> 1 2
> +-----------------------------+
> 1 | .1041666667 .4791666667 |
> +-----------------------------+
>
> The two regression coefficients are .104 and .479, just like your simulation
> shows. So the question now is why one agrees with the partial correlation
> and the other doesn't.
>
> The partial correlation yx.z comes from the conditional distribution of y
> and x given z, which has variance (I'll type rather than extract the values
> for clarity)
>
> : gz = (1, .2 \ .2, 1) - (.5 \ .2) * (.5 , .2)
>
> : gz
> [symmetric]
> 1 2
> +-------------+
> 1 | .75 |
> 2 | .1 .96 |
> +-------------+
>
> : corr(gz)[1,2]
> .1178511302
>
> So the partial correlation is indeed 0.118. Note that given z the
> (conditional) variances of y and x are different.
>
> Now look at yz.x, which requires a different conditional distribution
>
> : gx = (1, .5 \ .5, 1) - (.2 \ .2) * (.2 , .2)
>
> : gx
> [symmetric]
> 1 2
> +-------------+
> 1 | .96 |
> 2 | .46 .96 |
> +-------------+
>
> : corr(gx)[1,2]
> .4791666667
>
> And the partial correlation is indeed .479. Note that given x, the
> (conditional) variances of y and z happen to be the same. And therein lies a
> clue.
>
> Suppose we standardize the regression coefficients by the ratio of the
> standard deviations of the outcome and the predictor given the other
> predictor.
>
> For yz.x we do noting because the ratio is one. For yx.z we compute
>
> : b[1] * sqrt(gz[2,2]/gz[1,1])
> .1178511302
>
> And we have the partial correlation! So all is well.
>
> As an aside, my favorite way of computing partial correlations like yx.z is
> to regress y on z and compute residuals y.z, then regress x on z and compute
> residuals x.z (read the dot as 'net of'). If you regress y.z on x.z you get
> a constant of zero and a slope equal to the coefficient of x in the
> regression of y on both x and z. And the correlation between y.z and x.z is
> the same as the partial correlation yx.z.
>
> Cheers,
> Germán
>
> P.S. for more readable MVN formulas see
> http://en.wikipedia.org/wiki/Multivariate_normal_distribution
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Herb Smith
> Sent: Friday, October 20, 2006 7:11 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: What have I forgotten...?
>
> I have simulated three variables, X, Y, and Z, with means of 0, variances
> of 1, and a correlation matrix of
>
> Y z
>
> X .2 .2
>
> Y .5
>
> I calculate (pen and paper, or -dis-) partial correlations of r_sub_yz.x =
> .479167 and r_sub_yx.z = .117851
>
> If I generate a large enough sample, I can reproduce my correlation matrix
> with -corr- and the anticipated partial correlations with -pcorr- (not to
> mention the anticipated means and standard deviations, as per -summ-)
>
> But, when I -regress- y x z (with or without -, beta-) I get
>
> b_sub_yz.x ~ .479 (as I rather imagined I would), but
>
> b_sub_yx.z ~ .104 (not ~.118)
>
> I am forgetting something elementary about the (non?)-correspondence
> between partial correlation coefficients and standardized regression
> coefficients (I should think); else there is something weird in my code...
>
> Thanks in advance,
>
> --Herb
>
> Herbert L. Smith
> Professor of Sociology and
> Director, Population Studies Center
> 230 McNeil Building
> 3718 Locust Walk CR
> University of Pennsylvania
> Philadelphia, PA 19104-6298
>
> hsmith@pop.upenn.edu
>
> 215.898.7768 (office)
> 215.898.2124 (fax)
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/