[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Cluster analysis - cluster kmeans-
Herve STOLOWY <firstname.lastname@example.org> asks:
> I have a group of 21 observations with one variable (a score) and
> would like to create three "homogeneous" groups.
> I found the -cluster kmeans- command. Here are my command lines:
> gsort - finance_aggregate
> cluster kmeans finance_aggregate, k(3)
> Each time I run these commands, I get a different result (i.e., a
> different clustering: the three groups are different). I looked
> at the help file but don't understand. (It might be related to
> the start option but I am not sure).
> Is there a way to obtain the same result everytime?
You can -set seed 183289- (or any other number you like) before
each call of -cluster kmeans- so that the same set of random
starting values are selected each time. Or, as you were
guessing, you can use the -start()- option to do the same thing
(with several suboptions controlling the k starting groups), see
-help cluster kmeans- for details.
SR Millis <email@example.com> said:
> You're going to need more than 1 variable. Cluster
> analysis is a multivariable technique. In addition, a
> sample size of only 21 is often too small for cluster
While cluster analysis is a multivariate technique, it will work
with a single variable also. That is no problem. Having only 21
observations might or might not be a problem. It depends on the
data. After you do your cluster analysis you might want to look
at some summaries or graphs of the resulting three groups.
. set seed 12345
. cluster kmeans myvar, k(3) name(myclus)
. bysort myclus: summarize myvar
. twoway dot myvar myclus
and possibly also
. cluster stop
(or similarly -anova myvar myclus-) to get a feel for how
distinct the groups are.
Ken Higbee firstname.lastname@example.org
* For searches and help try: