Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Re: Cluster Analysis

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: Re: Cluster Analysis
Date   Mon, 20 Oct 2003 14:39:57 +0100

Janet Noble

> > > I have inherited a dataset consisting of 40 variables
> > > (binary, categorical
> > > and continuous) which may be diagnostic predictors of the
> > > state of a tumour.
> > > I was going to use cluster analysis using Gower's
> > > coefficient of similarity
> > > (Cluster Analysis, Everitt et al) to find which variables
> > > best predict the
> > > tumour state but this does not appear to be implemented in
> > > Stata 8. Is there
> > > any other coefficicient or a completely different analysis
> > > I could use in
> > > Stata. I really do not want to buy another statistical
> > > package but I will if
> > > I have to.

> >Given the clear identification of a response
> >variable, this doesn't sound like a cluster analysis
> >problem at all. Which predictive model(s) is or
> >are appropriate will depend on how tumour state
> >is quantified.

> The first part of the analysis was 
> to see if any of the "diagnostic variables" clustered 
> together and were 
> effectively measuring the "same thing", which was why I 
> considered cluster 
> analysis. The second component was to see if these 
> clusters, if they existed 
> are related to both the site and state of the tumour. 
> Previous work has just 
> considered the effect of diagnostic variables one at a time 
> on site or 
> state. I thought that attempting to analyse the complete 
> dataset would be more useful.

My own bias on problems like this is twofold: 

1. You are interested, naturally enough, in correlations 
among predictors. Whether observations are clustered 
together is a different issue. It is easy to think 
of continua with high correlations, continua with 
low correlations, cluster structure with high 
correlations and cluster structure with low 

2. If cluster structure exists, it will be evident 
in plots of the first few principal components. 
The fact that some of your variables
are categorical or binary would complicate a PCA without 
making it impossible. 

[email protected] 

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index