[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: clustering with a new dataset

From   Frank Gallo <>
Subject   Re: st: clustering with a new dataset
Date   Mon, 04 May 2009 04:16:42 -0400


Hi Walt,

I am a Stata beginner so I have little to offer you regarding procedures available in Stata: maybe other listers can. However, conceptually speaking, it sounds as though she wants to cross-validate her model. A well-fitting cluster (or factor) model is tentative. It requires post-hoc model validation. The researcher may use a random sample from a validation holdout sample for cross-validation (i.e., within sample replication, which requires a large sample size). Though even if a model fits the data well, it does not mean that it is the correct model or even the best model to explain the phenomenon of interest. There may be equivalent models that fit the sample data or other data sources equally well. If the researcher uncovers equivalent models, there is no statistical technique for discriminating among them. Only on substantive knowledge about the phenomenon can the researcher decide which equivalent model is best. The researcher may judge a model "good" on both theoretical and statistical grounds, and thus, provisionally accept the model. Cross-validation procedures on different independent samples (seems like your case) from the same population can enhance the utility of the model. You may compare the models by examining the overall fit indices (e.g., chi-square, RMSEA) and the significance of path coefficients to offer the client some insight. I hope this helps.


On May 3, 2009, at 9:08 PM, Data Analytics Corp. wrote:


I ran a cluster analysis last year for a client using "cluster ward varlist" where the variables in varlist came from a survey. This worked fine and the client was happy. This year, she returned with a new dataset (same variables, just new values from a new survey) and wants last year's clusters applied to this year's data. I can't see how to do this - in fact it doesn't seem to make sense. Any suggestions, or should I tell her that I can just rerun the old commands and MAYBE the same clusters will appear?




Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536
(V) 609-936-8999
(F) 609-936-3733

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index