Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: cluster analysis


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: cluster analysis
Date   Tue, 24 Jan 2012 13:43:06 +0000

As I understand it, you want to cluster data for two variables into two groups. 

Any clustering that makes sense will be evident on a scatter plot and allow scientific interpretation. 
K-means sounds to me overkill for such a problem, but tastes differ. 

I know that many economists don't believe anything without a P-value attached. 

A more formal approach to such data would presumably start with a discriminant analysis. 

Nick 
n.j.cox@durham.ac.uk 

Gianluca Cafiso

I have run this cluster analysis:

cluster kmeans X1 X2 if id_X3==1, k(2) name(ca2) s(prandom) keepcen 
cluster list ca2 
cluster query ca2
return list 
sreturn list

However, I do not manage to get the following information related to the cluster analysis:

1 - the initial mean values used as group centers
(I command the way they are defined "prandom", but I want to see the values too)
2 - the value of the dissimilarity measure (L2,  euclidian)

Furthermore:

- Is there a way to test statistically whether my partition makes sense?
(I mean: do the data really flow into 2 groups?)
A statistician friend of mine suggested to look at Wilks' lamda. 
Does anybody know if it makes sense with Stata's cluster algorithm and , if so,
how to get it?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index