[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Jean-Benoit Hardouin <[email protected]> |

To |
[email protected] |

Subject |
Re: st: creating Hierarchical cluster analysis with a differentmeasure of distance |

Date |
Mon, 18 Jul 2005 20:33:10 +0200 |

Dear Allan,

The -clv- module is based on an algorithm to select groups of variables in order to maximize the sum of the first eigenvalues of each groups (the criterion is close of this one used in the Varclus SAS procedure and, more generaly, of the PCA). It is not easy to modify the code to obtain a HCA based on a matrix of similarity because the criterion used by -clv- have not properties of distances (and so the algorithm is not adapt to do that). If you use Stata 9, follow the Ken Higbee's advice and use the very nice -clustermat- official module. With preeceding versions of Stata, I have wrote the -hcaccprox- module which realize a HCA on the variables based on specific similarity measures: this module cannot directly use others similarity measures than these ones I have programed, but a modification of the code is more easy for your purpose.

Best,

Jean-Benoit Hardouin

[email protected] a �crit :

Allan Garland <[email protected]> asks:

Is there a relatively easy way to implement a hierarchical cluster analysis in Stata 9 on the variables (not the observations), using a different measure of distance between the variables?Stata 9 has the -clustermat- command that performs hierarchical

The "clv" program appears to use an approach to assessing the distances similar to that of principal components. Using the built-in cluster commands on the variables requires transposing the rows and columns of the data. I looked at that routine and it wasn't simple enough (for me) to see how to alter Jean-Benoit Hardouin's code to do this (I'm an intermediate at program writing).

In any case, what I want to implement in Stata is what Frank Harrell describes in his textbook (FE Harrell Jr. (2001). Regression Modeling Strategies. New York, Springer) where he promotes the value of doing HCA (for data reduction) using as a measure of distance/similarity between variables the Hoeffding's D (W Hoeffding. A Non-Parametric Test of Independence. Annals of Mathematical Statistics 19(4):546-557, 1948). I've written code that calculates the matrix of D's between all pairs of variables, and am HOPING that someone can point me in the direction of Stata programming code that will let me do something simple --- i.e. just plug the matrix of D's in and thus obtain a HCA.

clustering on a matrix. Since you have already created a matrix

with the Hoeffding's D distances you can feed that into

-clustermat-. In the Stata 9 manuals look at "[MV] clustermat"

(starting on page 83 of the MV manual).

Ken Higbee [email protected]

StataCorp 1-800-STATAPC

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

-- *************************************************************************** Jean-Benoit Hardouin Biostatisticien Observatoire R�gional de la Sant� du Centre BP 2439 1, rue Porte Madeleine 45032 Orl�ans Cedex 1 t�l : 02 38 74 48 80 fax : 02 38 74 48 81 Email : [email protected] ************************************************************************** * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:

- Prev by Date:
**Re: st: programming Stata to automatically output into Exceltables** - Next by Date:
**st: putting string variable values into macros** - Previous by thread:
**Re: st: creating Hierarchical cluster analysis with a different measure of distance** - Next by thread:
**Re: st: svy variance estimation: delete 1 jackknife & svymlogit and syvclogit?** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |