Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: cluster analysis

From   Nick Cox <>
To   "''" <>
Subject   st: RE: cluster analysis
Date   Tue, 24 Jan 2012 13:43:06 +0000

As I understand it, you want to cluster data for two variables into two groups. 

Any clustering that makes sense will be evident on a scatter plot and allow scientific interpretation. 
K-means sounds to me overkill for such a problem, but tastes differ. 

I know that many economists don't believe anything without a P-value attached. 

A more formal approach to such data would presumably start with a discriminant analysis. 


Gianluca Cafiso

I have run this cluster analysis:

cluster kmeans X1 X2 if id_X3==1, k(2) name(ca2) s(prandom) keepcen 
cluster list ca2 
cluster query ca2
return list 
sreturn list

However, I do not manage to get the following information related to the cluster analysis:

1 - the initial mean values used as group centers
(I command the way they are defined "prandom", but I want to see the values too)
2 - the value of the dissimilarity measure (L2,  euclidian)


- Is there a way to test statistically whether my partition makes sense?
(I mean: do the data really flow into 2 groups?)
A statistician friend of mine suggested to look at Wilks' lamda. 
Does anybody know if it makes sense with Stata's cluster algorithm and , if so,
how to get it?

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index