Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: New command for clustering -clustpop- |
Date | Thu, 21 Apr 2011 19:39:33 +0100 |
Although there are plenty of exceptions, most cluster analysis implementations that I've heard of are essentially exploratory in spirit. Any inferential calculations are contingent not only on how repeated sampling is set up, but also on the particular cluster analysis method chosen. What has been called the classification crunch amounts to this: If you have well-distinguished clusters, some simple sensible graphical method will show you what they are. If you don't, you should lay in supplies for endless experimentation with how cluster dissimilarity is defined. how observations or clusters should be grouped into larger clusters, and so forth. To revisit a well-worn joke, statistical people can be clustered into those who take cluster analysis very seriously and those who don't. But it could be replied that this applies to most other statistical methods too. Also, cluster analysis with a stronger hypothesis testing element tends to be called something else, say discriminant analysis. Just my proverbial tuppenceworth, Nick On Thu, Apr 21, 2011 at 7:24 PM, Airey, David C <david.airey@vanderbilt.edu> wrote: > . > > What do other software packages usually do with cluster analyses? > >> Thanks to Kit Baum, a new command of interest to -cluster- users has >> been uploaded to ssc. >> >> Users of -cluster- are no doubt familiar that each run of the command >> on the same data produces different clustering. >> >> A run of cluster is, in effect, a sample (n=1) from the population of >> possible cluster groupings. >> >> -clustpop- expands the sample size by running -cluster- many times to >> estimate the population group assignments. The most frequent group >> assignment is taken as the estimate of the population and a >> statistical test of significance is performed to ensure the lower >> bound of the proportion is greater than 0.5. In other words, users >> can be confident, at a given alpha level, that this group assignment >> occurs for the majority of cases in the population. Cases which do no >> meet the criteria are set to missing. >> >> I expect that most users will prefer this method to using the >> -cluster- command alone. >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/