Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Clustermat puzzle

 From brendan.halpin@ul.ie (Brendan Halpin) To statalist@hsphsun2.harvard.edu Subject st: Clustermat puzzle Date Sat, 24 Mar 2012 19:42:57 +0000

```I have a small matrix of pairwise distances (all integers) that I'm
passing to clustermat (Ward's method). I notice that if I scale the
distances by a constant, I get different results. On investigation it
seems that if I scale it by other than an integer power of two I get one
solution, and by a power of two, another.

Code below demonstrates the problem. Experimentation with the code shows
that using a factor of a power of two by 0.11 (e.g. 0.44, 1.76) also
returns the original solution.

While clustering is often vulnerable to small changes in the data, it
shouldn't be affected by a simple scale change. Presumably something
subtle is happening with the internal representations of the distances.

Brendan

use http://teaching.sociology.ul.ie/bhalpin/dist

mkmat d1-d42, mat(D)

cluster generate a4=groups(4)

capture program drop cltest
program define cltest
args mult
tempname n4 diff M
matrix `M' = D * `mult'
cluster generate `n4'=groups(4)
tab `n4' a4
gen `diff' = `n4' - a4
su `diff'
di _newline
if r(mean)!=0 {
di "Cluster solutions differ, factor " `mult'
}
else {
di "Cluster solutions identical, factor " `mult'
}
cluster drop `M'
end

cltest 2
cltest 3
cltest 1/40
cltest 0.125
cltest 0.44

--
Brendan Halpin,   Department of Sociology,   University of Limerick,   Ireland
Tel: w +353-61-213147  f +353-61-202569  h +353-61-338562;  Room F1-009 x 3147