Stata 11 help for cluster

help cluster -------------------------------------------------------------------------------

Title

[MV] cluster -- Introduction to cluster-analysis commands

Syntax

Cluster analysis of data

cluster subcommand ...

Cluster analysis of a dissimilarity matrix

clustermat subcommand ...

Description

Stata's cluster-analysis routines provide several hierarchical and partition clustering methods, postclustering summarization methods, and cluster-management tools. This entry presents an overview of the cluster and clustermat commands (also see [MV] clustermat), as well as Stata's cluster-analysis management tools. The hierarchical clustering methods may be applied to the data by using the cluster command or to a user-supplied dissimilarity matrix by using the clustermat command.

The cluster command has the following subcommands, which are detailed in their respective manual entries.

Topic cluster subcommand ----------------------------------------------------------------- Partition-clustering methods for observations (see [MV] cluster kmeans and kmedians) Kmeans cluster kmeans Kmedians cluster kmedians

Hierarchical clustering methods for observations (see [MV] cluster linkage) Single linkage cluster singlelinkage Average linkage cluster averagelinkage Complete linkage cluster completelinkage Weighted-average linkage cluster waveragelinkage Median linkage cluster medianlinkage Centroid linkage cluster centroidlinkage Ward's linkage cluster wardslinkage

Postclustering commands Stopping rules cluster stop Dendrograms (cluster trees) cluster dendrogram (synonym: cluster tree) Generate summary variables cluster generate

User utilities Cluster notes cluster notes Other user utilities (see [MV] cluster utility) cluster dir cluster list cluster drop cluster use cluster rename cluster renamevar

Programmer utilities (see [MV] cluster programming) cluster query cluster set cluster delete cluster parsedistance cluster measures

(Dis)similarity measures (see [MV] measure_option) -----------------------------------------------------------------

The clustermat command has the following subcommands, which are detailed along with the related cluster command in the cluster linkage help file. Also see [MV] clustermat.

Topic clustermat subcommand ----------------------------------------------------------------- Hierarchical clustering of a dissimilarity matrix (see [MV] cluster linkage) Single linkage clustermat singlelinkage Complete linkage clustermat completelinkage Average linkage clustermat averagelinkage Weighted average linkage clustermat waveragelinkage Median linkage clustermat medianlinkage Centroid linkage clustermat centroidlinkage Ward's linkage clustermat wardslinkage -----------------------------------------------------------------

Partition-clustering methods for observations

(see [MV] cluster kmeans and kmedians; help cluster kmeans and kmedians)

Kmeans cluster analysis

cluster kmeans [varlist] [if] [in] , k(#) [ options ]

Kmedians cluster analysis

cluster kmedians [varlist] [if] [in] , k(#) [ options ]

options description ------------------------------------------------------------------------- Main * k(#) perform cluster analysis resulting in # groups measure(measure) similarity or dissimilarity measure; default is L2 (Euclidean) name(clname) name of resulting cluster analysis

Options start(start_option) obtain k initial group centers by using start_option; see Options for details keepcenters append the k final group means or medians to the data

Advanced generate(groupvar) name of grouping variable iterate(#) maximum number of iterations; default is iterate(10000) ------------------------------------------------------------------------- * k(#) is required.

Hierarchical clustering for observations

(see [MV] cluster linkage; help cluster linkage)

cluster linkage [varlist] [if] [in] [, options ]

linkage description ------------------------------------------------------------------------- singlelinkage single-linkage cluster analysis averagelinkage average-linkage cluster analysis completelinkage complete-linkage cluster analysis waveragelinkage weighted-average linkage cluster analysis medianlinkage median-linkage cluster analysis centroidlinkage centroid-linkage cluster analysis wardslinkage Ward's linkage cluster analysis -------------------------------------------------------------------------

cluster_options description ------------------------------------------------------------------------- Main measure(measure) similarity or dissimilarity measure name(clname) name of resulting cluster analysis

Advanced generate(stub) prefix for generated variables; default prefix is clname -------------------------------------------------------------------------

Postclustering commands

Stopping rules (see [MV] cluster stop; help cluster stop)

Cluster stopping rules for hierarchical clustering are provided with the cluster stop command.

Cluster analysis of data

cluster stop [clname] [, options]

Cluster analysis of a dissimilarity matrix

clustermat stop [clname] , variables(varlist) [options]

options description ------------------------------------------------------------------------- rule(calinski) use Calinski/Harabasz pseudo-F index stopping rule; the default rule(duda) use Duda/Hart Je(2)/Je(1) index stopping rule * rule(rule_name) use rule_name stopping rule groups(numlist) compute stopping rule for specified groups matrix(matname) save the results in matrix matname + variables(varlist) compute the stopping rule using varlist ------------------------------------------------------------------------- * rule(rule_name) is not shown in the dialog box. See [MV] cluster programming subroutines for information on how to add stopping rules to the cluster stop command. + variables(varlist) is required with a clustermat solution and optional with a cluster solution.

Dendrograms (see [MV] cluster dendrogram; help cluster dendrogram)

cluster dendrogram [clname] [if] [in] [, options ]

options description ------------------------------------------------------------------------- Main quick do not center parent branches labels(varname) name of variable containing leaf labels cutnumber(#) display top # branches only cutvalue(#) display branches above # (dis)similarity measure only showcount display number of observations for each branch countprefix(string) prefix the branch count with string; default is ``n='' countsuffix(string) suffix the branch count with string; default is empty string countinline put branch count inline with branch label vertical orient dendrogram vertically (default) horizontal orient dendrogram horizontally

Plot line_options affect rendition of the plotted lines

Add plots addplot(plot) add other plots to the dendrogram

Y axis, X axis, Titles, Legend, Overall twoway_options any option other than by() documented in [G] twoway_options -------------------------------------------------------------------------

Note: cluster tree is a synonym for cluster dendrogram.

In addition to the restrictions imposed by if and in, the observations are automatically restricted to those that were used in the cluster analysis.

Generate summary variables (see [MV] cluster generate; help cluster generate)

The cluster generate command generates summary or grouping variables after a cluster analysis. The groups() function generates variables indicating cluster membership into the specified number(s) of clusters after a hierarchical cluster analysis. The cut() function generates a variable indicating cluster membership based on cutting the dendrogram at the specified (dis)similarity value.

Generate grouping variables for specified numbers of clusters

cluster generate { newvar | stub } = groups(numlist) [, options ]

Generate grouping variable by cutting the dendrogram

cluster generate newvar = cut(#) [, name(clname) ]

options description ------------------------------------------------------------------------- name(clname) name of cluster analysis to use in producing new variables ties(error) produce error message for ties; default ties(skip) ignore requests that result in ties ties(fewer) produce results for largest number of groups smaller than your request ties(more) produce results for smallest number of groups larger than your request -------------------------------------------------------------------------

User utilities

Cluster notes (see [MV] cluster notes; help cluster notes)

The cluster notes command provides the ability to add, view, and delete notes for a cluster analysis.

Add a note to a cluster analysis

cluster notes clname : text

List all cluster notes

cluster notes

List cluster notes associated with specified cluster analysis

cluster notes clnamelist

Drop cluster notes

cluster notes drop clname [in numlist]

User utilities (see [MV] cluster utility; help cluster utility)

cluster user utility subcommands allow you to view and manipulate cluster objects. cluster dir gives a directory-style listing of the currently defined clusters. cluster list gives a detailed listing of clusters. cluster drop removes the named clusters. cluster use marks a cluster analysis as the most recent one. cluster rename allows the renaming of a cluster analysis. cluster renamevar properly renames variables attached to a cluster analysis.

Directory-style listing of currently defined clusters

cluster dir

Detailed listing of clusters

cluster list [clnamelist] [, list_options ]

Drop the named clusters

cluster drop { clnamelist | _all }

Mark a cluster analysis as the most recent one

cluster use clname

Rename a cluster

cluster rename oldclname newclname

Rename variables attached to a cluster

cluster renamevar oldvarname newvarname [, name(clname) ]

cluster renamevar oldstub newstub , prefix [ name(clname) ]

list_options description ------------------------------------------------------------------------- Options notes list cluster notes type list cluster analysis type method list cluster analysis method dissimilarity list cluster analysis dissimilarity measure similarity list cluster analysis similarity measure vars list variable names attached to the cluster analysis chars list any characteristics attached to the cluster analysis other list any "other" information

* all list all items and information attached to the cluster; the default ------------------------------------------------------------------------- * all is not shown in the dialog box.

Programmer utilities (see [MV] cluster programming utilities; help cluster programming)

The query, set, and delete subcommands of cluster provide programmers a method of obtaining, setting, and deleting the underlying information and structures of a cluster analysis. The parsedistance subcommand provides parsing of distance options for programmers. The measures subcommand computes (dis)similarities.

Obtain various attributes of a cluster analysis

cluster query [clname]

Set various attributes of a cluster analysis

cluster set [clname] [, set_options ]

Delete attributes from a cluster analysis

cluster delete clname [, delete_options ]

Check similarity and dissimilarity measure name

cluster parsedistance measure

Compute similarity and dissimilarity measure

cluster measures varlist [if] [in] , compare(numlist) generate(newvarlist) [measures_options]

set_options description ------------------------------------------------------------------------- addname add clname to the master list of cluster analyses type(type) set the cluster type for clname method(method) set the name of the clustering method for the cluster analysis similarity(measure) set the name of the similarity measure used for the cluster analysis dissimilarity(measure) set the name of the dissimilarity measure used for the cluster analysis var(tag varname) set tag that points to varname char(tag charname) set tag that points to charname other(tag text) set tag with text attached to the tag marker note(text) add a note to the clname -------------------------------------------------------------------------

delete_options description ------------------------------------------------------------------------- zap delete all possible settings for clname delname remove clname from the master list of current cluster analysis type delete the cluster type entry from clname method delete the cluster method entry from clname dissimilarity delete the dissimilarity entries from clname similarity delete the similarity entries from clname notes(numlist) delete the specified numbered notes from clname allnotes remove all notes from clname var(tag) remove tag from clname allvars remove all the entries pointing to variables for clname varzap(tag) same as var(), but also delete the referenced variable allvarzap same as allvars, but also delete the variables char(tag) remove tag that points to a Stata characteristic from clname allchars remove all entries pointing to Stata characteristics for clname charzap(tag) same as char(), but also delete the characteristic allcharzap same as allchars, but also delete the characteristics other(tag) delete tag and its associated text from clname allothers delete all entries from clname that have been set using other() -------------------------------------------------------------------------

measures_options description ------------------------------------------------------------------------- * compare(numlist) use numlist as the comparison observations * generate(newvarlist) generate newvarlist variables measure (dis)similarity measure; default is L2 propvars interpret observations implied by if and in as proportions of binary observations propcompares interpret comparison observations as proportions of binary observations ------------------------------------------------------------------------- * compare(numlist) and generate(newvarlist) are required.

(Dis)similarity measures (see [MV] measure_option; help measure_option)

Measures are divided into those for continuous data and binary data. measure capitalization does not matter. Full definitions are presented in the Continuous measure definitions and Binary measure definitions sections.

measure description ------------------------------------------------------------------------- cont_measure similarity or dissimilarity measure for continuous data binary_measure similarity measure for binary data mixed_measure dissimilarity measure for a mix of binary and continuous data -------------------------------------------------------------------------

cont_measure description ------------------------------------------------------------------------- L2 Euclidean distance (Minkowski with argument 2) Euclidean alias for L2 L(2) alias for L2 L2squared squared Euclidean distance Lpower(2) alias for L2squared L1 absolute-value distance (Minkowski with argument 1) absolute alias for L1 cityblock alias for L1 manhattan alias for L1 L(1) alias for L1 Lpower(1) alias for L1 Linfinity maximum-value distance (Minkowski with infinite argument) maximum alias for Linfinity L(#) Minkowski distance with # argument Lpower(#) Minkowski distance with # argument raised to # power Canberra Canberra distance correlation correlation coefficient similarity measure angular angular separation similarity measure angle alias for angular -------------------------------------------------------------------------

binary_measure description ------------------------------------------------------------------------- matching simple matching similarity coefficient Jaccard Jaccard binary similarity coefficient Russell Russell and Rao similarity coefficient Hamann Hamann similarity coefficient Dice Dice similarity coefficient antiDice anti-Dice similarity coefficient Sneath Sneath and Sokal similarity coefficient Rogers Rogers and Tanimoto similarity coefficient Ochiai Ochiai similarity coefficient Yule Yule similarity coefficient Anderberg Anderberg similarity coefficient Kulczynski Kulczynski similarity coefficient Pearson Pearson's phi similarity coefficient Gower2 similarity coefficient with same denominator as Pearson -------------------------------------------------------------------------

mixed_measure description ------------------------------------------------------------------------- Gower Gower's dissimilarity coefficient -------------------------------------------------------------------------

Hierarchical clustering of a dissimilarity matrix

(see [MV] cluster linkage; help cluster linkage)

clustermat linkage [varlist] [if] [in] [, clustermat_opts ]

linkage description ------------------------------------------------------------------------- singlelinkage single-linkage cluster analysis averagelinkage average-linkage cluster analysis completelinkage complete-linkage cluster analysis waveragelinkage weighted-average linkage cluster analysis medianlinkage median-linkage cluster analysis centroidlinkage centroid-linkage cluster analysis wardslinkage Ward's linkage cluster analysis -------------------------------------------------------------------------

clustermat_options description ------------------------------------------------------------------------- Main shape(shape) shape (storage method) of matname add add cluster information to data currently in memory clear replace data in memory with cluster information labelvar(varname) place dissimilarity matrix row names in varname name(clname) name of resulting cluster analysis

Advanced force perform clustering after fixing matname problems generate(stub) prefix for generated variables -------------------------------------------------------------------------

shape matname is stored as a ------------------------------------------------------------------------- full square symmetric matrix; the default lower vector of rowwise lower triangle (with diagonal) llower vector of rowwise strict lower triangle (no diagonal) upper vector of rowwise upper triangle (with diagonal) uupper vector of rowwise strict upper triangle (no diagonal) -------------------------------------------------------------------------

Also see

Manual: [MV] cluster

Help: [MV] clustermat, [MV] cluster programming utilities


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index