Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | brendan.halpin@ul.ie (Brendan Halpin) |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: cluster analysis validation |
Date | Tue, 17 Apr 2012 00:54:03 +0100 |
On Mon, Apr 16 2012, Dasinger, Lisa wrote: > I've run a cluster analysis in Stata 11.2 based on three continuous > variables using -cluster wardslinkage- with the default > similarity/dissimilarity measure to generate 6 groups. I'd like to know > if there is a way to apply the same cluster analysis to a new set of > data. In other words, is there a way to run a new dataset through the > old cluster analysis and see how new observations are classified, akin > to running a regression equation and then taking a new dataset to obtain > out of sample predictions? I would suggest pooling the two data sets, running a new cluster analysis, and analysing the resultant 6*2 table (cluster classification by old/new). That would test the extent to which the two data sets are similarly distributed across a joint classification. If that's acceptable (and the combined data set is small enough) it is a clean and easy solution. If you are concerned that the joint classification is not compatible with the old classification, you can compare the old cluster membership with the joint cluster membership, for the old data. A good measure of agreement is the Adjusted Rand Index. Comparing cluster solutions is tricky because they don't have "labels" -- there is no way of saying that a given group in classification A is the same as any particular group in classification B, apart from having shared membership. The ARI takes that into account. In theory you can relate the new data to the old classification by calculating the distance from each new observation to the old cluster centroids, but I don't know an easy way of doing that with Stata. Regards, Brendan PS: I have code to estimate the ARI, and to re-arrange cluster solutions to maximise similarity. If you are interested, check out: net from http:teaching.sociology.ul.ie/sadi net install sadi and look at the -ari- and -permtab- commands. -- Brendan Halpin, Department of Sociology, University of Limerick, Ireland Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F1-009 x 3147 mailto:brendan.halpin@ul.ie ULSociology on Facebook: http://on.fb.me/fjIK9t http://teaching.sociology.ul.ie/bhalpin/wordpress twitter:@ULSociology * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/