Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Package "clstop_lbt" available on SSC

From	Dirk Enzmann <[email protected]>
To	[email protected]
Subject	st: Package "clstop_lbt" available on SSC
Date	Sun, 03 Feb 2013 18:16:54 +0100

Thanks to Kit Baum, the new package "clstop_lbt" is available on SSC.

"clstop_lbt" adds the rule "lbt" to the post-estimation command -clusterstop- to determine the number of kmeans clusters using Steinley &Brusco's (2011) lower bound technique (LBT). It is used via -clusterstop, rule(lbt)-.

"clstop_lbt" creates the normalized index LBT that measures thecloseness of the observed value of the within-cluster sums of squares(SSE) to the the minimum value of SSE in terms of total sums of squares(SST) according to LBT = (SSE - SSE_min)/SST. The method to determinethe lower bound of SSE (i.e. SSE_min) is given in Steinley & Brusco(2011, p. 289). If the number of variables is equal or less than thenumber of clusters k, LBT is equal to the ratio SSE/SST (in this case,the LBT cannot be used).

"clstop_lbt" can also be used to determine whether there is more thanone cluster. In this case the ratio SSE(2)/SST of a two cluster solutionshould be less than the lower bound ratio (LBR) obtainable when there isonly one cluster - assuming a (multivariate) normal distribution, theLBR(normal) is 1-2/pi = .3634, assuming a univariate distribution theLBR(univariate) is .25.

A simulation study by Steinley & Brusco (2011) shows that the LBT indexoutperforms the accuracy and precision of the CH (Calinski/Harabasz)index. However, the LBT requires that the number of variables exceed thenumber of clusters. In cases of equal or less variables than the numberof clusters Steinley & Brusco recommend to use the CH index (which isthe default when using -cluster stop-).


Reference:

Steinley, D. & Brusco, M. J. (2011). Choosing the number of clusters inK-means clustering. Psychological Methods, 16, 285-297. [http://psycnet.apa.org/journals/met/16/3/285/ ]


Dirk

========================================
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Rothenbaumchaussee 33
D-20148 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
       +49-(0)40-42838.4591 (Mrs Billon)
fax:   +49-(0)40-42838.2344
email: [email protected]
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
========================================
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: re: st: Package -r2c- now available in SSC
Next by Date: Re: st: outreg: merging results from several regressions
Previous by thread: st: tetrachoric correlation coefficien
Next by thread: Stata 13 Wish List: Firth's method for Cox models
Index(es):
- Date
- Thread