Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: New command for clustering -clustpop-

From   Nick Cox <>
Subject   Re: st: New command for clustering -clustpop-
Date   Thu, 21 Apr 2011 19:39:33 +0100

Although there are plenty of exceptions, most cluster analysis
implementations that I've heard of are essentially exploratory in
spirit. Any inferential calculations are contingent not only on how
repeated sampling is set up, but also on the particular cluster
analysis method  chosen. What has been called the classification
crunch amounts to this: If you have well-distinguished clusters, some
simple sensible graphical method will show you what they are. If you
don't, you should lay in supplies for endless experimentation with how
cluster dissimilarity is defined. how observations or clusters should
be grouped into larger clusters, and so forth.

To revisit a well-worn joke, statistical people can be clustered into
those who take cluster analysis very seriously and those who don't.
But it could be replied that this applies to most other statistical
methods too. Also, cluster analysis with a stronger hypothesis testing
element tends to be called something else, say discriminant analysis.

Just my proverbial tuppenceworth, Nick

On Thu, Apr 21, 2011 at 7:24 PM, Airey, David C
<> wrote:
> .
> What do other software packages usually do with cluster analyses?
>> Thanks to Kit Baum, a new command of interest to -cluster- users has
>> been uploaded to ssc.
>> Users of -cluster- are no doubt familiar that each run of the command
>> on the same data produces different clustering.
>> A run of cluster is, in effect, a sample (n=1) from the population of
>> possible cluster groupings.
>> -clustpop- expands the sample size by running -cluster- many times to
>> estimate the population group assignments.  The most frequent group
>> assignment is taken as the estimate of the population and a
>> statistical test of significance is performed to ensure the lower
>> bound of the proportion is greater than 0.5.  In other words, users
>> can be confident, at a given alpha level, that this group assignment
>> occurs for the majority of cases in the population.  Cases which do no
>> meet the criteria are set to missing.
>> I expect that most users will prefer this method to using the
>> -cluster- command alone.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index