If you think that classification is
a kind of modelling, then one question
is how well (in this case) group medians predict the
original values. That could give you (for
example) a vector of correlations. As the number
of groups approaches the number of
observations, such correlations all
approach 1. So goodness of fit is
optimised by having singleton classes,
as might be guessed on other grounds.
To keep the clustering/modelling analogy
going:
1. Any single-valued figure of merit
arguably should quantify the trade-off
between parsimony or simplicity and
goodness-of-fit. Measures of model fit
that go beyond R^2, such as C_p, AIC, BIC,
DIC, etc. show that this can be done
in many different ways. Naturally,
one is not obliged to try to pack
all the information into a single
measure.
2. I'd argue that the main point
of a cluster analysis is to find
clusters that make sense. The key
then lies in (e.g.) more informal
assessment relating the clusters
to the underlying science
(subject-matter knowledge, pertinent
theory) and/or a graphical check
of cluster structure. (If clusters
are genuine, then there should be
some graphical way of seeing much
of the structure.)
Nick
n.j.cox@durham.ac.uk
Carlos de Los Rios
> I am performing a "cluster kmedian" analysis, and I am wondering if
> there is any tool that can measure the "goodness of fit" of the number
> of groups predetermined.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/