Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: correlate by group and collapse


From   "Nick Winter" <nwinter@policystudies.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: correlate by group and collapse
Date   Wed, 10 Jul 2002 16:34:55 -0400

> -----Original Message-----
> From: Roger Harbord [mailto:Roger.Harbord@bristol.ac.uk] 
> Sent: Wednesday, July 10, 2002 3:44 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: correlate by group and collapse
> 
> Dear Statalisters,
> 
> I want to collapse my dataset by a group variable and retain the 
> correlation coefficient of two variables.  In other 
> words, I'd like to be able to do something like:
> . collapse (correlation) var1 var2, by(group)
> or maybe:
> . by group: egen corr12=corr(var1 var2)
> . collapse corr12, by(group)
> 
> However, collapse doesn't have correlation among its stats (it only 
> allows a selection of univariate statistics) and egen doesn't have a 
> corr function.
> I know I can do:
> . by group: correlate var1 var2
> - but I want to save the results and do further analysis on 
> them rather
> than just displaying them.
> 
> The best I've come up with is (supposing I have 100 groups):
> . gen corr12=.
> . for num 1/100, noheader: qui correlate var1 var2 if group==X \ 
>       qui replace corr12=r(rho) if study==X
> . collapse corr12, by(group)
> 
> This seems kind of clumsy though, and it took me a while to work out 
> that I needed _noheader_ and _quietly_ to stop my screen filling with 
> output. It also becomes quite lengthy if I want several pairwise 
> correlations. Is there a better way? 
> 
> I think I'd like egen to have a _corr_ and/or a _cov_ function - I 
> would have thought it would be of wider interest than the calculation 
> of U.S. marginal income tax rates, which is already 
> implemented as egen 
> function mtr! I've checked the extensions to egen in the STB package 
> _egenodd_ and tried a couple of _findit_'s, but I didn't find 
> anything 
> suitable.

I've attached below a program to do this with egen.  Save the whole
thing as "_gcorr.ado"  (that is, DO NOT separate out the GenCorr part as
a separate file.  

The syntax is:

	[by varlist:] egen newvar = var1 var2 [if exp] [in exp] [ ,
covariance ]

The ", covariance" option generates coveriances; otherwise it does
correlations.

Nick Winter

**************** BEGINNING OF _gcorr.ado
**************************************
*! NJGW 10jul2002
*! syntax:  [by varlist:] egen newvar = var1 var2 [if exp] [in exp] [ ,
covariance ]
*! computes correlation (or covariance) between var1 and var2,
optionally by: varlist
*!    and stores the result in newvar.
program define _gcorr
	version 7

	gettoken type 0 : 0
	gettoken g    0 : 0
	gettoken eqs  0 : 0
	syntax varlist(min=2 max=2) [if] [in] [, BY(string) Covariance ]

	if `"`by'"'!="" {
		local by `"by `by':"'
	}

	quietly { 
		gen `type' `g' = .
		`by' GenCorr `varlist' `if' `in', thevar(`g')
`covariance'
	}
	capture label var `g' "Correlation `varlist'"
end

program define GenCorr, byable(recall)
	syntax varlist [if] [in] , thevar(string) [ covariance ]
	marksample touse
	if "`covariance'"=="" {
		local stat "r(rho)"
	}
	else {
		local stat "r(cov_12)"
	}
	cap corr `varlist' if `touse' , `covariance'
	if !_rc {
		qui replace `thevar'=``stat'' if `touse'
	}
end

**************** END OF _gcorr.ado
**************************************
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index