Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Cluster analyis on hand made distance matrix


From   khigbee@stata.com
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Cluster analyis on hand made distance matrix
Date   Mon, 10 Mar 2008 11:45:15 -0500

Ulrich Kohler <kohler@wzb.eu> asks:

> I have two "hand made" distance matrizes, SQdist1 and SQdist2. Both
> distance matrizes are essentially identical, with the exception that
> they are differently ordered.
>
> If I perform a cluster analysis using singlelinkage for the two distance
> matrizes, I get identical results:
> 
> <cut>
>
> (The same is true for median-linkage and centroid linkage.)
> 
> However, if I use wards-linkage I get different results for the two
> distance matrizes:
> 
> . clustermat wards SQdist1, name(cluster1) add
> . clustermat wards SQdist2, name(cluster2) add
> . sum *_hgt
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
> cluster1_hgt |        53    .7051013     .861406   .1666667   4.414418
> cluster2_hgt |        53    .7051013    .8751653   .1666667   4.645984
> 
> Although the difference doesn't seem large, it have led to quite
> different groupings in a practical application. Unfortunately, I am not
> an expert with cluster analysis. So, please, can anybody explain me why
> this happens? If the order of distance matrix matter for
> cluster-analysis, what is the "correct" order of the distance matrix,
> then?

The hierarchical cluster analysis methods start with N groups
(each observation is a group).  At each step in the process the 2
closest groups are merged and this is continued until all
observations are in one group.  This can be viewed as a
dendrogram (cluster tree).

My guess is that there are ties in determining the closest 2
groups at one or more steps in the process and the order that the
data is presented changes which of these ties gets selected for
merging together at that step.

If Uli would like me to explore this further, he can send me the
SQdist1 and SQdist2 matrices and I will report back what I find.

Ken Higbee    khigbee@stata.com
StataCorp     1-800-STATAPC

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index