Allan Garland <[email protected]> asks:
> Is there a relatively easy way to implement a hierarchical cluster
> analysis in Stata 9 on the variables (not the observations), using a
> different measure of distance between the variables?
>
> The "clv" program appears to use an approach to assessing the distances
> similar to that of principal components. Using the built-in cluster
> commands on the variables requires transposing the rows and columns of
> the data. I looked at that routine and it wasn't simple enough (for me)
> to see how to alter Jean-Benoit Hardouin's code to do this (I'm an
> intermediate at program writing).
>
> In any case, what I want to implement in Stata is what Frank Harrell
> describes in his textbook (FE Harrell Jr. (2001). Regression Modeling
> Strategies. New York, Springer) where he promotes the value of doing HCA
> (for data reduction) using as a measure of distance/similarity between
> variables the Hoeffding's D (W Hoeffding. A Non-Parametric Test of
> Independence. Annals of Mathematical Statistics 19(4):546-557, 1948).
> I've written code that calculates the matrix of D's between all pairs of
> variables, and am HOPING that someone can point me in the direction of
> Stata programming code that will let me do something simple --- i.e.
> just plug the matrix of D's in and thus obtain a HCA.
Stata 9 has the -clustermat- command that performs hierarchical
clustering on a matrix. Since you have already created a matrix
with the Hoeffding's D distances you can feed that into
-clustermat-. In the Stata 9 manuals look at "[MV] clustermat"
(starting on page 83 of the MV manual).
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/