Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: cluster analysis validation

From   Paul Millar <>
Subject   Re: st: cluster analysis validation
Date   Thu, 19 Apr 2012 17:41:56 -0400


Cluster analysis is empirical - the group assignments are based on
minimizing the "distance" between cases, given the number of groups.
So a different sample entails different groups.  If you run cluster
analysis many times on the same data you will also get different
results for the same data (because the starting case is different,
assigned randomly using -set seed-).  I have written a routine that
tests the reliability of a groups assignment by sampling the group
assignments and then estimating whether the probability of assignment
in a particular group is > 0.5 in the population of group assignments.
 See -help clustpop- after -ssc install clustpop-

You can also do this after pooling the data, as suggested earlier.

- Paul

On Mon, Apr 16, 2012 at 6:57 PM, Dasinger, Lisa <> wrote:
> I've run a cluster analysis in Stata 11.2 based on three continuous
> variables using -cluster wardslinkage- with the default
> similarity/dissimilarity measure to generate 6 groups.  I'd like to know
> if there is a way to apply the same cluster analysis to a new set of
> data.  In other words, is there a way to run a new dataset through the
> old cluster analysis and see how new observations are classified, akin
> to running a regression equation and then taking a new dataset to obtain
> out of sample predictions?
> If so, is there a way to evaluate how well the "old" analysis fits the
> new data, e.g., by determining how similar/dissimilar each new
> observation is to the observations in the cluster in which each is
> placed, and whether the new observation is placed in the "best" cluster,
> meaning the one that minimizes the distance between the observation and
> the "center" of the existing cluster?
> I am new to cluster analysis and am looking for a way to validate the
> "old" cluster analysis.
> Lisa
> Lisa Dasinger, Ph.D.
> Data Reporting Manager
> Claims Analytics
> Zenith Insurance Company
> Pleasanton Regional Office
> 4309 Hacienda Drive, Suite 200
> Pleasanton, CA 94588
> ***********************************************************
> This e-mail, including attachments, contains information
> that may be confidential, protected by the attorney/client
> or other privileges, or exempt from disclosure under
> applicable law.  Further, this e-mail may contain
> information that is proprietary and/or constitutes a trade
> secret.  This e-mail, including attachments, constitutes
> non-public information intended to be conveyed only to the
> designated recipient of this communication, please be
> advised that any disclosure, dissemination, distribution,
> copying, or other use of this communication or any attached
> document is strictly prohibited.  If you have received this
> communication in error, please notify the sender
> immediately by reply e-mail and promptly destroy all
> electronic and printed copies of this communication and
> attached documents.
> ***********************************************************
> *
> *   For searches and help try:
> *
> *
> *

- Paul Millar, Ph.D.
School of Criminology and Criminal Justice
Nipissing University
North Bay, Ontario, Canada

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index