[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: Cluster analysis on survey data |

Date |
Fri, 29 Aug 2008 17:07:47 +0100 |

Thanks for this detailed reply. The characterization that you were given that cluster analysis looks at medians is not very illuminating. Cluster analysis is not a single method, or even a family of methods, but a clutch of loosely similar techniques. How clusters are defined, whether by using medians in some sense or by some other way of summarizing, is up to the user, and there are lots and lots of ways to do it, which is part of the charm, or some say the capriciousness, of the field. Nick n.j.cox@durham.ac.uk Tullar, Jessica M First thank you both for responding. To answer your questions... As far as the method to describe the kinds of people that report medical debt and medical bankruptcy... It was explained to me that regression (logit with survey weights was my first choice for how to answer this question, mlogit is an even better idea) looked at means while cluster analysis (the alternative method) looked at "medians". The implication of compared to... did not come into the discussion but makes a good point which I will bring back to my group. As to the other suggestion that cluster analysis did not need survey weights. You are correct that if all we are concerned about is description then the comparison of closeness of observations then whether they represent more or less individuals doesn't seem particularly concerning. However, my concerns arose from reading the chapter from Reading and Understanding More Multivariate Statistics (Grimm and Yarnold 2000) they discuss the importance of the representativeness of your sample (survey data would not be representative unless weighted) and that some cluster analysis methods are interested in equal sized groups (again dependent upon analyzing a true representative sample). However if we don't use methods that look at the size of groups and focus on the distance between observations then describing those groups using cluster analysis without weights does not seem too inappropriate as long as the focus and description are clear about the non-representativeness. As an aside, sorry about the unexplained reference in the original request. BRFSS is the Behavioral Risk Factor Surveillance System, a large ongoing telephone survey run through the U.S. Centers for Disease Control. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: RE: Cluster analysis on survey data***From:*"Tullar, Jessica M" <Jessica.M.Tullar@uth.tmc.edu>

- Prev by Date:
**Re: st: RE: Cluster analysis on survey data** - Next by Date:
**Re: st: too many variables.** - Previous by thread:
**Re: st: RE: Cluster analysis on survey data** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |