Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Clustermat puzzle

From (Brendan Halpin)
Subject   st: Clustermat puzzle
Date   Sat, 24 Mar 2012 19:42:57 +0000

I have a small matrix of pairwise distances (all integers) that I'm
passing to clustermat (Ward's method). I notice that if I scale the
distances by a constant, I get different results. On investigation it
seems that if I scale it by other than an integer power of two I get one
solution, and by a power of two, another.

Code below demonstrates the problem. Experimentation with the code shows
that using a factor of a power of two by 0.11 (e.g. 0.44, 1.76) also
returns the original solution. 

While clustering is often vulnerable to small changes in the data, it
shouldn't be affected by a simple scale change. Presumably something
subtle is happening with the internal representations of the distances.


Code to download the distance matrix and compare solutions:


mkmat d1-d42, mat(D)

clustermat wards D, name(D) add
cluster generate a4=groups(4)

capture program drop cltest
program define cltest
args mult
tempname n4 diff M
matrix `M' = D * `mult'
clustermat wards `M', name(`M') add
cluster generate `n4'=groups(4)
tab `n4' a4
gen `diff' = `n4' - a4
su `diff'
di _newline
if r(mean)!=0 {
di "Cluster solutions differ, factor " `mult'
else {
di "Cluster solutions identical, factor " `mult'
cluster drop `M'

cltest 2
cltest 3
cltest 1/40
cltest 0.125
cltest 0.44

Brendan Halpin,   Department of Sociology,   University of Limerick,   Ireland
Tel: w +353-61-213147  f +353-61-202569  h +353-61-338562;  Room F1-009 x 3147    ULSociology on Facebook:         twitter:@ULSociology
*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index