Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Clustering help

From	William Buchanan <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Clustering help
Date	Tue, 19 Mar 2013 04:45:55 -0700

The user should be providing a bit more information about their design.  In the context of the US educational system, schools are "clustered" within districts (LEAs), which are then "clustered" within states (SEAs); if charter schools are also included then it is possible to have "crossed" levels.  I assume, that in many other places around the world there are similar hierarchical structures that vary to some degree (e.g., schools clustered within cities, schools clustered within regions/states).  Given that the user is working in the context of a developing nation, it might be more meaningful to use geopolitical boundaries based on the local policy context (e.g., school governance/funding, resource distribution/availability).  If there is no substantive policy and/or organizational structure, then Nick's suggestion would still be a better solution than creating the clusters arbitrarily; depending on the research question the user might also consider creating clusters based !
 on the distance from a specific point in the data (e.g., schools within 5 miles increments of the point are clustered).  

HTH,
Billy

Sent from my iPhone

On Mar 19, 2013, at 4:15, Simon Falck <[email protected]> wrote:

> I think Nick´s suggesting is reasonable. 
> 
> However, you could also consult theory on how clusters can be defined and how the number of clusters can be determined. In principle, there is no optimal number of clusters. According to Mardia et al (see reference below) the number of clusters k can be estimated as k=(n/2)^1/2. Thus if you have 35 schools the number of clusters is (35/2)^(1/2) = 4. 
> 
> How many schools each clusters should contain can be determined using a range of (statistical) methods. For instance, you could use the Ward method which minimizes the variance within each cluster and thus maximizes the (empirical) homogeneity within each cluster of schools. This method implies you that your schools within each clusters will be relatively "similar" and that you do not interfere in the "selection procedure" and thus in choosing how many schools there "should be" in each cluster.
> 
> For more information, see e.g.
> 
> Mardia, Kenb, and Bibby (1979) Multivariate Analysis. Academic Press. London. Pages 360-384.
> Romesburg (2004) Cluster Analysis for Researchers. Lulu press. North Carolina. Pages 31-34.
> 
> Simon
> 
> 
> On 18 mar 2013, at 18:37, Nick Cox <[email protected]> wrote:
> 
>> I'd plot a map and identify clusters by eye. (Seriously.)
>> 
>> Nick
>> 
>> On Mon, Mar 18, 2013 at 7:15 AM, Ron Wendt <[email protected]> wrote:
>> 
>>> I'm looking to cluster some geocoded data into a specific
>>> number of clusters all of the same size.  For example, I want to make
>>> 7 clusters of 5 schools each.
>>> The best I've found so far is: cluster kmeans lat long, k(7).
>>> However, this doesn't let me specify the number of schools that should
>>> be in each cluster.  Is there another/better way to do this?
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Clustering help
  - From: Ron Wendt <[email protected]>
- Re: st: Clustering help
  - From: Nick Cox <[email protected]>
- Re: st: Clustering help
  - From: Simon Falck <[email protected]>

Prev by Date: Re: st: how do we jointly test coefficients from different regressions?
Next by Date: Re: st: Clustering help
Previous by thread: Re: st: Clustering help
Next by thread: Re: st: Clustering help
Index(es):
- Date
- Thread