Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: cluster analysis with foreach

From   "Nick Cox" <>
To   <>
Subject   st: RE: cluster analysis with foreach
Date   Wed, 9 Jun 2010 17:47:19 +0100

I have comments on three levels. 

First, your example implies that you have two variables, -state- and
-power_city-. I don't know what real scientific purpose clustering would
serve here. Clustering on virtually any criterion will just tend to
group together lots of smaller values of consumption, necessarily more
similar than others, as larger cities will typically be more spread out
in what for most if not all states will be very skewed distributions. 

Second, you set yourself up for the task of collating 50 or so cluster
analyses. ("or so" depending on DC, Puerto Rico, etc.) 

Third, should you persist, your syntax might be something like 

egen group = group(state), label 
su group, meanonly 

forval i = 1/`r(max)' { 
	cluster ... if group == `i'


Maximiliano Manuel Silva Correa

Im stuck trying to run a cluster analysis routine throuth diferent
sections of my data. Suppose we have power consumtion data about
different cities of the US. What I'd like to do is to run a cluster
analysis routine (cluster kmeans for example) by state, because i
would like to see in every state which cities have similar power

It would be something like

foreach s in states{

cluster(kmeans, power_city)


(states is a string variable)

Could someone show me the sintax here, or send similar examples?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index