[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: kmeans clustering -initial starting points

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: kmeans clustering -initial starting points
Date	Mon, 8 Jun 2009 15:34:39 +0100

Thanks for the references. My own prejudices are different: the main
problems with cluster analyses are in general being confident that any
kind of cluster analysis is a good idea and in particular that the
results you got are not just an artefact of arbitrary choices. But
that's as may be. 

Nick 
[email protected] 


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Alp Eren
Yurtseven
Sent: 08 June 2009 14:42
To: [email protected]
Subject: st: RE: kmeans clustering -initial starting points

Hi,

Quoting de Jong and Marsili,

The main problem with cluster analysis is to decide
on the number of clusters, to balance the need to represent
the data appropriately and, at the same time,
to keep the results manageable. A priori, we consider
two up to six groups manageable for finding plausible
interpretations and for future applications of the taxonomy.
Indeed previous taxonomies use the same range
of groups (see Table 1). To find a solution within this
range, we combined hierarchical and non-hierarchical
techniques (Milligan and Sokol, 1980; Punj and Stewart,
1983). For each number of groups (k), between two
and six, we perform a k-means "non-hierarchical" cluster
analysis, in which the firms are iteratively classified
based on their distance to some initial starting points
of dimension k. While some k-means methods use randomly
selected starting points, we employ the centroids
of an initial hierarchical solution for this purpose.4

4 To generate the initial solutions we carried out a hierarchical
analysis
by using theWard's method, which is based on squared Euclidian
distances.Ward's method generally provides good results compared to
other clustering methods (Milligan and Cooper, 1987). Homogeneous
groups are built so as to minimise the distance in scores of firms
within
a single cluster and to maximise the distance in scores between
companies
from the various clusters. A visual inspection of the dendrogram,
plotting the initial solutions of the hierarchical analysis, suggests a
taxonomy with four clusters.

de Jong,J.P.J., Marsili O., (2006). The fruit flies of innovation:
a taxonomy of innovative small firms. Research Policy 35, 213-229.

Punj, G., Stewart, D.W., 1983. Cluster analysis in marketing research:
review and suggestions for application. Journal of Marketing
Research 20, 134--148.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: RE: RE: kmeans clustering -initial starting points
  - From: "Solorzano Mosquera, Jenniffer" <[email protected]>

References:
- st: RE: kmeans clustering -initial starting points
  - From: Alp Eren Yurtseven <[email protected]>

Prev by Date: st: Standard errors of scoring coefficients
Next by Date: st: Levpet module
Previous by thread: st: RE: kmeans clustering -initial starting points
Next by thread: st: RE: RE: RE: kmeans clustering -initial starting points
Index(es):
- Date
- Thread