Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: kmeans clustering -initial starting points


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: kmeans clustering -initial starting points
Date   Mon, 8 Jun 2009 15:34:39 +0100

Thanks for the references. My own prejudices are different: the main
problems with cluster analyses are in general being confident that any
kind of cluster analysis is a good idea and in particular that the
results you got are not just an artefact of arbitrary choices. But
that's as may be. 

Nick 
n.j.cox@durham.ac.uk 


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Alp Eren
Yurtseven
Sent: 08 June 2009 14:42
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: kmeans clustering -initial starting points

Hi,

Quoting de Jong and Marsili,

The main problem with cluster analysis is to decide
on the number of clusters, to balance the need to represent
the data appropriately and, at the same time,
to keep the results manageable. A priori, we consider
two up to six groups manageable for finding plausible
interpretations and for future applications of the taxonomy.
Indeed previous taxonomies use the same range
of groups (see Table 1). To find a solution within this
range, we combined hierarchical and non-hierarchical
techniques (Milligan and Sokol, 1980; Punj and Stewart,
1983). For each number of groups (k), between two
and six, we perform a k-means "non-hierarchical" cluster
analysis, in which the firms are iteratively classified
based on their distance to some initial starting points
of dimension k. While some k-means methods use randomly
selected starting points, we employ the centroids
of an initial hierarchical solution for this purpose.4

4 To generate the initial solutions we carried out a hierarchical
analysis
by using theWard's method, which is based on squared Euclidian
distances.Ward's method generally provides good results compared to
other clustering methods (Milligan and Cooper, 1987). Homogeneous
groups are built so as to minimise the distance in scores of firms
within
a single cluster and to maximise the distance in scores between
companies
from the various clusters. A visual inspection of the dendrogram,
plotting the initial solutions of the hierarchical analysis, suggests a
taxonomy with four clusters.

de Jong,J.P.J., Marsili O., (2006). The fruit flies of innovation:
a taxonomy of innovative small firms. Research Policy 35, 213-229.

Punj, G., Stewart, D.W., 1983. Cluster analysis in marketing research:
review and suggestions for application. Journal of Marketing
Research 20, 134--148.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index