Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Joe Canner <jcanner1@jhmi.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: Re: st: RE: Selecting correlations with highest absolute value |
Date | Wed, 9 Oct 2013 23:40:20 +0000 |
Dara, Red Owl beat me to the answer I was going to give. If you have a good reason to use -pwcorr- instead of -corr-, then you might need something more complicated in which you loop over all your variables, accumulating pairwise correlations. foreach x of varlist tbmale...etc { foreach y of varlist tbmale...etc { corr `x' 'y' matrix corrvector=corrvector \ vec(r(C)) } } matvsort corrvector sortedvector matrix list sortedvector I don't have the ability to test this at the moment and I don't have matrix syntax memorized, so this might need some tweaking, particularly the matrix command inside the loops. I'm also not sure if you will need to initialize -correvector- before starting the loops. Let us know if have any problems and I'm sure someone can help. Joe ________________________________________ From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Dara Shifrer [Dara.Shifrer@rice.edu] Sent: Wednesday, October 09, 2013 7:04 PM To: statalist@hsphsun2.harvard.edu Subject: Fwd: Re: st: RE: Selecting correlations with highest absolute value Joe, thank you very much for your quick response to my quest to find the most highly correlated pairs of variables. I think I understand what your code does (finds correlations, linearly transforms the correlation matrix into a column vector, sorts this matrix, and then lists the sorted columns of correlations) but I'm not sure why it isn't working for me (see code below). I haven't used Stata's matrix commands before and may be missing something obvious. Thanks for any additional help anyone can provide! Dara pwcorr tbmale tdedc3 tbrace td9tchr td9slry tb9yrsh tb9yrsnh td10tchr td10slry /// tb10yrsh tb10yrsnh td11tchr td11slry tb11yrsh tb11yrsnh /// tp10pswm ta10a2w skd10size skd10blck skd10hisp skd10pvty skd10lep /// skd10biesl skd10gt skd10sped skd11size skd11blck skd11hisp skd11pvty skd11lep /// skd11biesl skd11gt skd11sped skd12size skd12blck skd12hisp skd12pvty skd12lep /// skd12biesl skd12gt skd12sped ta11elgb5 ta11ctgr ta11grd ta11chrt ta11sclvl /// ta11a1rg ta11a2rg ta11a2lrg ta11a2mrg ta11a2m9rg ta11a2m10rg ta11a2m11rg /// ta11a2rrg ta11a2r9rg ta11a2r10rg ta11a2r11rg ta11a2srg ta11a2s10rg ta11a2s11rg /// ta11a2ssrg ta11a2ss10rg ta11a2ss11rg /// ta11a3rg ta11a3arg ta11a3arrg ta11a3amrg ta11a3aparg ta11a3aperg ta11a3brg ta11a3crg /// trt12rtn tp12pswm tka12tme tka12tmebl tka12tms tka12tmsbl tka12tre tka12trebl tka12trs /// tka12trsbl tka12talg1 tka12talg1bl tka12tbio tka12tbiobl tka12te1r /// tka12te1rbl tka12te1w tka12te1wbl tka12twgeo tka12twgeobl /// tka12smegn tka12smsgn tka12sregn tka12srsgn tka12slegn tka12slsgn /// tka12ssegn tka12shegn tka12shsgn .... lots of correlations excluded... | tka~megn tka~msgn tka~regn tka~rsgn tka~legn tka~lsgn tka~segn -------------+--------------------------------------------------------------- tka12smegn | 1.0000 tka12smsgn | 0.1390 1.0000 tka12sregn | 0.6082 0.1509 1.0000 tka12srsgn | 0.1211 0.5660 0.1929 1.0000 tka12slegn | 0.5454 -0.0638 0.5637 0.1009 1.0000 tka12slsgn | 0.2572 0.5671 0.2427 0.5295 0.2006 1.0000 tka12ssegn | 0.4479 -0.1376 0.3819 -0.1273 0.4028 -0.1095 1.0000 tka12shegn | 0.4143 0.0340 0.4330 -0.2011 0.4543 -0.2584 0.5530 tka12shsgn | 0.5705 0.4077 0.3127 0.6170 0.2309 0.4094 0.2407 | tka~hegn tka~hsgn -------------+------------------ tka12shegn | 1.0000 tka12shsgn | 0.0918 1.0000 . matrix corrvector=vec(r(C)) . matvsort corrvector sortedvector . matrix list sortedvector sortedvector[4,1] c1 tka12shsgn:tka12shsgn 1 tka12shsgn:tka12shsgn 1 tka12shsgn:tka12shsgn 1 tka12shsgn:tka12shsgn 1 Postdoctoral Fellow, Houston Education Research Consortium Kinder Institute for Urban Research Rice University Dara.Shifrer@rice.edu On 10/8/2013 1:39 PM, Joe Canner wrote: > Dara, > > Here's one quick-n-dirty possibility. (It requires installing -matvsort- from SSC.) > > . corr varlist > . matrix corrvector=vec(r(C)) > . matvsort corrvector sortedvector > . matrix list sortedvector > > Regards, > Joe Canner > Johns Hopkins University School of Medicine > > > -----Original Message----- > From:owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Dara Shifrer > Sent: Tuesday, October 08, 2013 3:16 PM > To:statalist@hsphsun2.harvard.edu > Subject: st: Selecting correlations with highest absolute value > > > In SAS, I was able to quickly determine which pairs of variables were > most highly correlated using the 'best' option with the 'proc corr' > command ("*BEST=*/n ----/**/**/prints */n/* correlation coefficients for > each variable. Correlations are ordered from highest to lowest in > absolute value.) After extensive searching, I have not been able to > locate a Stata command that does something similar. > > If this is not possible in Stata, maybe Stata experts have suggestions > for my ultimate purpose: constructing equations to facilitate a smoother > and faster running of Stata's 'ice' command. > > Any help would be greatly appreciated, > Dara Shifrer > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/