Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: k-means cluster analysis question

From   [email protected]
To   [email protected]
Subject   Re: st: k-means cluster analysis question
Date   Thu, 04 Mar 2004 10:56:44 -0600

Agricola Odoi <[email protected]> asks:

> I am running k-means cluster analysis. The clusters have already been
> identified but I would like to calculate the distance between each of the
> clusters. Does anyone know how to do this in STATA?

First, I will assume that when you did -cluster kmeans ...- that
you used the -keepcenters- option to add the k cluster mean
points to the bottom of your data.

The -cluster measures- command (see "[CL] cluster programming
utilities" in the manual or -help clprog-) can compute the

For example if there were 100 observations in my dataset and I

    cluster kmeans ... , k(3) keepcenters ...

to obtain the 3 group k-means cluster solution, then observations
101-103 would contain the 3 group means.  I could then run

    cluster measures ... in 101/103, compare(101/103) gen(d1 d2 d3) ...

And the new variables d1, d2, and d3 would contain the desired
distances.  In particular, d1 (in observations 101/103) would
contain the distances between the mean of group 1 and the means
of the three groups. etc.

If you also wanted the distances between the various individual
observations and the group means you would change that last
command to

    cluster measures ... , compare(101/103) gen(d1 d2 d3) ...

i.e., leave off the -in 101/103-.  Then the 7th observation in
variable d2 would be the distance between the 7th observation and
the 2nd group mean (just as an example).

Ken Higbee    [email protected]
StataCorp     1-800-STATAPC

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index