Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -matsusort- added to -matvsort- package on SSC


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: -matsusort- added to -matvsort- package on SSC
Date   Fri, 24 Sep 2004 16:26:43 +0100

Roger Harbord's posting on -matvsort- had 
me scuttling to look and see what on Earth 
it did. That reminded indirectly of a utility 
I have long wanted, on and off, but have never
seen or got round to writing: a program to 
sort the rows and/or columns of a matrix 
according to some summary of the elements 
in the rows and/or columns. 

Here's a very simple example. Suppose 
we look at the car size variables in 
the auto data: 

. sysuse auto
(1978 Automobile Data)

. corr head trunk weight length displacement 
(obs=74)

             | headroom    trunk   weight   length displa~t
-------------+---------------------------------------------
    headroom |   1.0000
       trunk |   0.6620   1.0000
      weight |   0.4835   0.6722   1.0000
      length |   0.5163   0.7266   0.9460   1.0000
displacement |   0.4745   0.6086   0.8949   0.8351   1.0000

For display, we might want to reorder that matrix, for 
example to get clusters of high correlations and low 
correlations together, as far as possible. (We might 
also want fewer than 4 d.p.) The first could be 
achieved by detailed inspection and re-typing the variable 
names in different order, but an automated solution is 
also desirable, especially for much bigger problems. 

One first step is to get the correlations into a matrix
in the sense of Stata's -matrix- commands. 

There are several ways to do that. One is -matcorr- from 
STB-56: 

. matcorr head trunk weight length displacement , matrix(corr) 
(obs=74)

< same matrix, naturally > 

Then -matsusort- (now added to the -matvsort- package on 
SSC, thanks to Kit Baum) sorts the rows according to 
their means. That is, 

	for each row { 
		calculate the mean of the row elements 
	}
	sort the rows according to the order of their means 

The -decrease- option controls which way 
they are sorted, and the the -columns- option 
does it by columns. 

. matsusort corr scorr, dec 

. matsusort scorr scorr, col dec 

We now have more control e.g. over format from -matrix list-: 

. mat li scorr , format(%9.3f) 

symmetric scorr[5,5]
                    length        weight  displacement         trunk      headroom
      length         1.000
      weight         0.946         1.000
displacement         0.835         0.895         1.000
       trunk         0.727         0.672         0.609         1.000
    headroom         0.516         0.483         0.474         0.662         1.000

. mat li scorr , format(%9.3f) nohalf

symmetric scorr[5,5]
                    length        weight  displacement         trunk      headroom
      length         1.000         0.946         0.835         0.727         0.516
      weight         0.946         1.000         0.895         0.672         0.483
displacement         0.835         0.895         1.000         0.609         0.474
       trunk         0.727         0.672         0.609         1.000         0.662
    headroom         0.516         0.483         0.474         0.662         1.000

The sorting by means is just the default. There is a handle allowing you 
to sort according to _any_ summary measure produced by -summarize-. It's 
unlikely that anyone would choose to sort by kurtosis, but the generality
is cheap. 

You can this bundled with the other stuff previously in -matvsort- by

. ssc inst matvsort 

or 

. ssc inst matvsort, replace 

Nick 
n.j.cox@durham.ac.uk 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index