[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: correlate by group and collapse |

Date |
Thu, 11 Jul 2002 09:54:07 +0100 |

Roger Harbord > > > > I want to collapse my dataset by a group variable and retain the > > correlation coefficient of two variables. In other > > words, I'd like to be able to do something like: > > . collapse (correlation) var1 var2, by(group) > > or maybe: > > . by group: egen corr12=corr(var1 var2) > > . collapse corr12, by(group) > > > > However, collapse doesn't have correlation among its stats (it only > > allows a selection of univariate statistics) and egen doesn't have a > > corr function. > > I know I can do: > > . by group: correlate var1 var2 > > - but I want to save the results and do further analysis on > > them rather > > than just displaying them. > > > > The best I've come up with is (supposing I have 100 groups): > > > > This seems kind of clumsy though, and it took me a while to work out > > that I needed _noheader_ and _quietly_ to stop my screen filling with > > output. It also becomes quite lengthy if I want several pairwise > > correlations. Is there a better way? > > > > I think I'd like egen to have a _corr_ and/or a _cov_ function - I > > would have thought it would be of wider interest than the calculation > > of U.S. marginal income tax rates, which is already > > implemented as egen > > function mtr! I've checked the extensions to egen in the STB package > > _egenodd_ and tried a couple of _findit_'s, but I didn't find > > anything > > suitable. Nick Winter > I've attached below a program to do this with egen. Save the whole > thing as "_gcorr.ado" (that is, DO NOT separate out the GenCorr part as > a separate file. > > The syntax is: > > [by varlist:] egen newvar = var1 var2 [if exp] [in exp] [ , > covariance ] > > The ", covariance" option generates coveriances; otherwise it does > correlations. > > Nick Winter > > **************** BEGINNING OF _gcorr.ado > end < snip > > > **************** END OF _gcorr.ado Nick's -egen- solution solves this problem excellently. This is just a sidenote to opine that Roger's original solution is not so bad as he implies, and that it can be extended to the full problem with -forvalues- and -foreach-. And a simple but more general point is this: master -forvalues- and -foreach- and you have a tool for other problems and need not be dependent on programmers, who can indeed seem capricious sometimes in what they do and do not supply. (I gather that marginal tax rate is an everyday tool for lots of users.) Setting aside the -collapse-, Roger's -for- solution was gen corr12=. for num 1/100, noheader: qui correlate var1 var2 if group==X \ qui replace corr12=r(rho) if study==X The equivalent with -forval- is gen corr12 = . qui forval `i' = 1/100 { corr var1 var2 if group == `i' replace corr12 = r(rho) if study == `i' } This may seem no gain, but as was said in another thread recently, my main reservation about -for- is that it doesn't grow gracefully when extended to more complicated problems, whereas -foreach- and -forval- typically do. (I've kept the distinction between -group- and -study-, which is immaterial to the main point here, whether or not it's a typo.) Now this can be extended to lots of variables: qui foreach x of var <varlist> { foreach y of var <varlist> { gen r`x'`y' = . forval i = 1/100 { corr `x' `y' if group == `i' replace r`x'`y' = r(rho) if study == `i' } } } Embedded in that is an assumption that variable names are short enough that names like -r`x'`y'- remain legal after substitution. At worst, that problem could be fixed by mass renaming. Also, there is wastefulness here by a factor of about 2, as correlations are symmetric and self-correlations of 1 are of no interest. This could be tackled in various ways, one of which is to ignore that problem. Another is to check for the existence of r`y'`x' before we calculate r`x'`y'. There is a tutorial on -forvalues- and -foreach- in Stata Journal 2(2), 202-222 (2002). The slides of a talk on the same subject are accessible to non-subscribers at http://www.stata.com/support/meeting/8uk/fortitude.pdf or http://fmwww.bc.edu/RePEc/usug2002/fortitude.pdf (Exactly the same file.) Incidentally, 1. -statsby- statsby "corr var1 var2" corr = r(rho) , by(group) is a good solution if only one correlation is of interest, but not, I think, for many. The reason is that -statsby- includes a built-in collapsing of the data set, so you would need to read in the original data set repeatedly to do it repeatedly. 2. -egen- extras. Most of -egenodd- is already in Stata 7. Other user-written packages of -egen- functions are accessible via -findit-. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: correlate by group and collapse***From:*"Nick Winter" <nwinter@policystudies.com>

- Prev by Date:
**st: varlist** - Next by Date:
**st: size aware memory allocation when opening files** - Previous by thread:
**st: RE: correlate by group and collapse** - Next by thread:
**st: graph window problem** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |