Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: What have I forgotten...?


From   "German Rodriguez" <grodri@Princeton.EDU>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: What have I forgotten...?
Date   Fri, 20 Oct 2006 13:06:46 -0400

Herb,

The short answer is that there's nothing wrong with your code, and the
regression coefficients need just the right standardization to evolve into
partial correlations.

Let (y,x,z) ~ MVN(m,V). Partition m=(m1\m2) and V=(V11, V12 \ V21, V22) so y
has mean m1 and variance V11 and the column vector x\z has mean m2 and
variance V22. 

Then the conditional distribution of y|x\z is MVN with mean m1 - V12 V22^-1
(x\z-m2) and variance V11 - V12 V22^-1 V21 [nicer-looking formulas in
Wikipidea, see link below].

We can do these calculations in Mata. In your example the unconditional
means are all zero so we work just with V

: V = (1, .2, .5 \ .2, 1, .2 \ .5, .2,  1)

: b = V[1,(2,3)] * invsym( V[(2\3),(2,3)] )

: b
                 1             2
    +-----------------------------+
  1 |  .1041666667   .4791666667  |
    +-----------------------------+

The two regression coefficients are .104 and .479, just like your simulation
shows. So the question now is why one agrees with the partial correlation
and the other doesn't.

The partial correlation yx.z comes from the conditional distribution of y
and x given z, which has variance (I'll type rather than extract the values
for clarity)

: gz = (1, .2 \ .2, 1) - (.5 \ .2) * (.5 , .2)

: gz
[symmetric]
         1     2
    +-------------+
  1 |  .75        |
  2 |   .1   .96  |
    +-------------+

: corr(gz)[1,2]
  .1178511302

So the partial correlation is indeed 0.118. Note that given z the
(conditional) variances of y and x are different.

Now look at yz.x, which requires a different conditional distribution
 
: gx = (1, .5 \ .5, 1) - (.2 \ .2) * (.2 , .2)

: gx
[symmetric]
         1     2
    +-------------+
  1 |  .96        |
  2 |  .46   .96  |
    +-------------+

: corr(gx)[1,2]
  .4791666667

And the partial correlation is indeed .479. Note that given x, the
(conditional) variances of y and z happen to be the same. And therein lies a
clue.

Suppose we standardize the regression coefficients by the ratio of the
standard deviations of the outcome and the predictor given the other
predictor.

For yz.x we do noting because the ratio is one. For yx.z we compute

: b[1] * sqrt(gz[2,2]/gz[1,1])
  .1178511302

And we have the partial correlation! So all is well.

As an aside, my favorite way of computing partial correlations like yx.z is
to regress y on z and compute residuals y.z, then regress x on z and compute
residuals x.z (read the dot as 'net of'). If you regress y.z on x.z you get
a constant of zero and a slope equal to the coefficient of x in the
regression of y on both x and z. And the correlation between y.z and x.z is
the same as the partial correlation yx.z.

Cheers,
Germán

P.S. for more readable MVN formulas see
http://en.wikipedia.org/wiki/Multivariate_normal_distribution 

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Herb Smith
Sent: Friday, October 20, 2006 7:11 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: What have I forgotten...?

I have simulated three variables, X, Y, and Z, with means of 0, variances
of 1, and a correlation matrix of

	Y	z

X	.2	.2

Y		.5

I calculate (pen and paper, or -dis-) partial correlations of r_sub_yz.x =
.479167 and r_sub_yx.z = .117851

If I generate a large enough sample, I can reproduce my correlation matrix
with -corr- and the anticipated partial correlations with -pcorr- (not to
mention the anticipated means and standard deviations, as per -summ-)

But, when I -regress- y x z (with or without -, beta-) I get

b_sub_yz.x ~ .479 (as I rather imagined I would), but

b_sub_yx.z ~ .104 (not ~.118)

I am forgetting something elementary about the (non?)-correspondence
between partial correlation coefficients and standardized regression
coefficients (I should think); else there is something weird in my code...

Thanks in advance,

--Herb

Herbert L. Smith
Professor of Sociology and
Director, Population Studies Center
230 McNeil Building
3718 Locust Walk CR
University of Pennsylvania
Philadelphia, PA  19104-6298

hsmith@pop.upenn.edu

215.898.7768 (office)
215.898.2124 (fax)
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index