Thanks to Kit Baum, the new package "clstop_lbt" is available on SSC.
"clstop_lbt" adds the rule "lbt" to the post-estimation command -cluster
stop- to determine the number of kmeans clusters using Steinley &
Brusco's (2011) lower bound technique (LBT). It is used via -cluster
stop, rule(lbt)-.
"clstop_lbt" creates the normalized index LBT that measures the
closeness of the observed value of the within-cluster sums of squares
(SSE) to the the minimum value of SSE in terms of total sums of squares
(SST) according to LBT = (SSE - SSE_min)/SST. The method to determine
the lower bound of SSE (i.e. SSE_min) is given in Steinley & Brusco
(2011, p. 289). If the number of variables is equal or less than the
number of clusters k, LBT is equal to the ratio SSE/SST (in this case,
the LBT cannot be used).
"clstop_lbt" can also be used to determine whether there is more than
one cluster. In this case the ratio SSE(2)/SST of a two cluster solution
should be less than the lower bound ratio (LBR) obtainable when there is
only one cluster - assuming a (multivariate) normal distribution, the
LBR(normal) is 1-2/pi = .3634, assuming a univariate distribution the
LBR(univariate) is .25.
A simulation study by Steinley & Brusco (2011) shows that the LBT index
outperforms the accuracy and precision of the CH (Calinski/Harabasz)
index. However, the LBT requires that the number of variables exceed the
number of clusters. In cases of equal or less variables than the number
of clusters Steinley & Brusco recommend to use the CH index (which is
the default when using -cluster stop-).