Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE:cluster analysis


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE:cluster analysis
Date   Thu, 23 Sep 2004 09:56:06 +0100

If you think that classification is 
a kind of modelling, then one question 
is how well (in this case) group medians predict the 
original values. That could give you (for 
example) a vector of correlations. As the number
of groups approaches the number of 
observations, such correlations all 
approach 1. So goodness of fit is 
optimised by having singleton classes, 
as might be guessed on other grounds. 

To keep the clustering/modelling analogy 
going: 

1. Any single-valued figure of merit 
arguably should quantify the trade-off
between parsimony or simplicity and 
goodness-of-fit. Measures of model fit
that go beyond R^2, such as C_p, AIC, BIC, 
DIC, etc. show that this can be done 
in many different ways. Naturally, 
one is not obliged to try to pack 
all the information into a single 
measure. 

2. I'd argue that the main point 
of a cluster analysis is to find 
clusters that make sense. The key 
then lies in (e.g.) more informal 
assessment relating the clusters 
to the underlying science 
(subject-matter knowledge, pertinent 
theory) and/or a graphical check 
of cluster structure. (If clusters
are genuine, then there should be 
some graphical way of seeing much 
of the structure.) 

Nick 
n.j.cox@durham.ac.uk 

Carlos de Los Rios
 
> I am performing a "cluster kmedian" analysis, and I am wondering if
> there is any tool that can measure the "goodness of fit" of the number
> of groups predetermined.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index