Stata 11 help for cluster_stop

help cluster stop dialogs: cluster stop clustermat stop -------------------------------------------------------------------------------

Title

[MV] cluster stop -- Cluster-analysis stopping rules

Syntax

Cluster analysis of data

cluster stop [clname] [, options]

Cluster analysis of a dissimilarity matrix

clustermat stop [clname] , variables(varlist) [options]

options description ------------------------------------------------------------------------- rule(calinski) use Calinski/Harabasz pseudo-F index stopping rule; the default rule(duda) use Duda/Hart Je(2)/Je(1) index stopping rule * rule(rule_name) use rule_name stopping rule groups(numlist) compute stopping rule for specified groups matrix(matname) save the results in matrix matname + variables(varlist) compute the stopping rule using varlist ------------------------------------------------------------------------- * rule(rule_name) is not shown in the dialog box. See [MV] cluster programming subroutines for information on how to add stopping rules to the cluster stop command. + variables(varlist) is required with a clustermat solution and optional with a cluster solution.

Menu

Statistics > Multivariate analysis > Cluster analysis > Postclustering > Cluster analysis stopping rules

Description

Cluster-analysis stopping rules are used to determine the number of clusters. A stopping-rule value (also called an index) is computed for each cluster solution (e.g., at each level of the hierarchy in a hierarchical cluster analysis). Larger values (or smaller, depending on the particular stopping rule) indicate more distinct clustering.

The cluster stop and clustermat stop commands currently provide two stopping rules, the Calinski and Harabasz (1974) pseudo-F index and the Duda and Hart (1973) Je(2)/Je(1) index. For both rules, larger values indicate more distinct clustering. Presented with the Duda-Hart Je(2)/Je(1) values are pseudo-T-squared values. Smaller pseudo-T-squared values indicate more distinct clustering.

clname specifies the name of the cluster analysis. The default is the most recently performed cluster analysis, which can be reset using the cluster use command; see [MV] cluster utility.

More stop rules may be added; see [MV] cluster programming subroutines, which illustrates this ability by showing a program that adds the step-size stopping rule.

Options

rule(calinski | duda | rule_name) indicates the stopping rule. rule(calinski), the default, specifies the Calinski-Harabasz pseudo-F index. rule(duda) specifies the Duda-Hart Je(2)/Je(1) index.

rule(calinski) is allowed for both hierarchical and nonhierarchical cluster analyses. rule(duda) is allowed only for hierarchical cluster analyses.

You can add stopping rules to the cluster stop command (see [MV] cluster programming subroutines) by using the rule(rule_name) option. [MV] cluster programming subroutines illustrates how to add stopping rules by showing a program that adds a rule(stepsize) option, which implements the simple step-size stopping rule mentioned in Milligan and Cooper (1985).

groups(numlist) specifies the cluster groupings for which the stopping rule is to be computed. groups(3/20) specifies that the measure be computed for the three-group solution, the four-group solution, ..., and the 20-group solution.

With rule(duda), the default is groups(1/15). With rule(calinski) for a hierarchical cluster analysis, the default is groups(2/15). groups(1) is not allowed with rule(calinski) because the measure is not defined for the degenerate one-group cluster solution. The groups() option is unnecessary (and not allowed) for a nonhierarchical cluster analysis.

If there are ties in the hierarchical cluster-analysis structure, some (or possibly all) of the requested stopping-rule solutions may not be computable. cluster stop passes over, without comment, the groups() for which ties in the hierarchy cause the stopping rule to be undefined.

matrix(matname) saves the results in a matrix named matname.

With rule(calinski), the matrix has two columns, the first giving the number of clusters and the second giving the corresponding Calinski-Harabasz pseudo-F stopping-rule index.

With rule(duda), the matrix has three columns: the first column gives the number of clusters, the second column gives the corresponding Duda-Hart Je(2)/Je(1) stopping-rule index, and the third column provides the corresponding pseudo-T-squared values.

variables(varlist) specifies the variables to be used in the computation of the stopping rule. By default, the variables used for the cluster analysis are used. variables() is required for cluster solutions produced by clustermat.

Examples

. cluster stop . cluster stop myclus, rule(duda) . cluster stop, rule(calinski) groups(2/20) matrix(z)

Saved results

cluster stop and clustermat stop with rule(calinski) saves the following in r():

Scalars r(calinski_#) Calinski-Harabasz pseudo-F for # groups

Macros r(rule) calinski r(label) C-H pseudo-F r(longlabel) Calinski & Harabasz pseudo-F

cluster stop and clustermat stop with rule(duda) saves the following in r():

Scalars r(duda_#) Duda-Hart Je(2)/Je(1) value for # groups r(dudat2_#) Duda-Hart pseudo-T-squared value for # groups

Macros r(rule) duda r(label) D-H Je(2)/Je(1) r(longlabel) Duda & Hart Je(2)/Je(1) r(label2) D-H pseudo-T-squared r(longlabel2) Duda & Hart pseudo-T-squared

References

Calinski, T., and J. Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics 3: 1-27.

Duda, R. O., and P. E. Hart. 1973. Pattern Classification and Scene Analysis. New York: Wiley.

Milligan, G. W., and M. C. Cooper. 1985. An examination of procedures for determining the number of clusters in a dataset. Psychometrika 50: 159-179.

Also see

Manual: [MV] cluster stop

Help: [MV] cluster, [MV] clustermat


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index