help cluster stop dialogs: cluster stop
clustermat stop
-------------------------------------------------------------------------------
Title
[MV] cluster stop -- Cluster-analysis stopping rules
Syntax
Cluster analysis of data
cluster stop [clname] [, options]
Cluster analysis of a dissimilarity matrix
clustermat stop [clname] , variables(varlist) [options]
options description
-------------------------------------------------------------------------
rule(calinski) use Calinski/Harabasz pseudo-F index
stopping rule; the default
rule(duda) use Duda/Hart Je(2)/Je(1) index stopping
rule
* rule(rule_name) use rule_name stopping rule
groups(numlist) compute stopping rule for specified groups
matrix(matname) save the results in matrix matname
+ variables(varlist) compute the stopping rule using varlist
-------------------------------------------------------------------------
* rule(rule_name) is not shown in the dialog box. See [MV] cluster
programming subroutines for information on how to add stopping rules to
the cluster stop command.
+ variables(varlist) is required with a clustermat solution and optional
with a cluster solution.
Menu
Statistics > Multivariate analysis > Cluster analysis > Postclustering >
Cluster analysis stopping rules
Description
Cluster-analysis stopping rules are used to determine the number of
clusters. A stopping-rule value (also called an index) is computed for
each cluster solution (e.g., at each level of the hierarchy in a
hierarchical cluster analysis). Larger values (or smaller, depending on
the particular stopping rule) indicate more distinct clustering.
The cluster stop and clustermat stop commands currently provide two
stopping rules, the Calinski and Harabasz (1974) pseudo-F index and the
Duda and Hart (1973) Je(2)/Je(1) index. For both rules, larger values
indicate more distinct clustering. Presented with the Duda-Hart
Je(2)/Je(1) values are pseudo-T-squared values. Smaller pseudo-T-squared
values indicate more distinct clustering.
clname specifies the name of the cluster analysis. The default is the
most recently performed cluster analysis, which can be reset using the
cluster use command; see [MV] cluster utility.
More stop rules may be added; see [MV] cluster programming subroutines,
which illustrates this ability by showing a program that adds the
step-size stopping rule.
Options
rule(calinski | duda | rule_name) indicates the stopping rule.
rule(calinski), the default, specifies the Calinski-Harabasz pseudo-F
index. rule(duda) specifies the Duda-Hart Je(2)/Je(1) index.
rule(calinski) is allowed for both hierarchical and nonhierarchical
cluster analyses. rule(duda) is allowed only for hierarchical
cluster analyses.
You can add stopping rules to the cluster stop command (see [MV]
cluster programming subroutines) by using the rule(rule_name) option.
[MV] cluster programming subroutines illustrates how to add stopping
rules by showing a program that adds a rule(stepsize) option, which
implements the simple step-size stopping rule mentioned in Milligan
and Cooper (1985).
groups(numlist) specifies the cluster groupings for which the stopping
rule is to be computed. groups(3/20) specifies that the measure be
computed for the three-group solution, the four-group solution, ...,
and the 20-group solution.
With rule(duda), the default is groups(1/15). With rule(calinski)
for a hierarchical cluster analysis, the default is groups(2/15).
groups(1) is not allowed with rule(calinski) because the measure is
not defined for the degenerate one-group cluster solution. The
groups() option is unnecessary (and not allowed) for a
nonhierarchical cluster analysis.
If there are ties in the hierarchical cluster-analysis structure,
some (or possibly all) of the requested stopping-rule solutions may
not be computable. cluster stop passes over, without comment, the
groups() for which ties in the hierarchy cause the stopping rule to
be undefined.
matrix(matname) saves the results in a matrix named matname.
With rule(calinski), the matrix has two columns, the first giving the
number of clusters and the second giving the corresponding
Calinski-Harabasz pseudo-F stopping-rule index.
With rule(duda), the matrix has three columns: the first column gives
the number of clusters, the second column gives the corresponding
Duda-Hart Je(2)/Je(1) stopping-rule index, and the third column
provides the corresponding pseudo-T-squared values.
variables(varlist) specifies the variables to be used in the computation
of the stopping rule. By default, the variables used for the cluster
analysis are used. variables() is required for cluster solutions
produced by clustermat.
Examples
. cluster stop
. cluster stop myclus, rule(duda)
. cluster stop, rule(calinski) groups(2/20) matrix(z)
Saved results
cluster stop and clustermat stop with rule(calinski) saves the following
in r():
Scalars
r(calinski_#) Calinski-Harabasz pseudo-F for # groups
Macros
r(rule) calinski
r(label) C-H pseudo-F
r(longlabel) Calinski & Harabasz pseudo-F
cluster stop and clustermat stop with rule(duda) saves the following in
r():
Scalars
r(duda_#) Duda-Hart Je(2)/Je(1) value for # groups
r(dudat2_#) Duda-Hart pseudo-T-squared value for # groups
Macros
r(rule) duda
r(label) D-H Je(2)/Je(1)
r(longlabel) Duda & Hart Je(2)/Je(1)
r(label2) D-H pseudo-T-squared
r(longlabel2) Duda & Hart pseudo-T-squared
References
Calinski, T., and J. Harabasz. 1974. A dendrite method for cluster
analysis. Communications in Statistics 3: 1-27.
Duda, R. O., and P. E. Hart. 1973. Pattern Classification and Scene
Analysis. New York: Wiley.
Milligan, G. W., and M. C. Cooper. 1985. An examination of procedures
for determining the number of clusters in a dataset. Psychometrika
50: 159-179.
Also see
Manual: [MV] cluster stop
Help: [MV] cluster, [MV] clustermat