Stata 13 help for cluster kmedians

Title

[MV] cluster kmeans and kmedians -- Kmeans and kmedians cluster analysis

Syntax

Kmeans cluster analysis

cluster kmeans [varlist] [if] [in] , k(#) [ options ]

Kmedians cluster analysis

cluster kmedians [varlist] [if] [in] , k(#) [ options ]

options Description ------------------------------------------------------------------------- Main * k(#) perform cluster analysis resulting in # groups measure(measure) similarity or dissimilarity measure; default is L2 (Euclidean) name(clname) name of resulting cluster analysis

Options start(start_option) obtain k initial group centers by using start_option keepcenters append the k final group means or medians to the data

Advanced generate(groupvar) name of grouping variable iterate(#) maximum number of iterations; default is iterate(10000) ------------------------------------------------------------------------- * k(#) is required.

Menu

cluster kmeans

Statistics > Multivariate analysis > Cluster analysis > Cluster data > Kmeans

cluster kmedians

Statistics > Multivariate analysis > Cluster analysis > Cluster data > Kmedians

Description

cluster kmeans and cluster kmedians perform kmeans and kmedians partition cluster analysis, respectively. See [MV] cluster for a listing of the cluster commands.

Options

+------+ ----+ Main +-------------------------------------------------------------

k(#) is required and indicates that # groups are to be formed by the cluster analysis.

measure(measure) specifies the similarity or dissimilarity measure. The default is measure(L2), Euclidean distance. This option is not case sensitive. See [MV] measure_option for detailed descriptions of the supported measures.

name(clname) specifies the name to attach to the resulting cluster analysis. If name() is not specified, Stata finds an available cluster name, displays it for your reference, and attaches the name to your cluster analysis.

+---------+ ----+ Options +----------------------------------------------------------

start(start_option) indicates how the k initial group centers are to be obtained. The available start_options are

krandom[(seed#)], the default, specifies that k unique observations be chosen at random, from among those to be clustered, as starting centers for the k groups. Optionally, a random-number seed may be specified to cause the command set seed seed# (see [R] set seed) to be applied before the k random observations are chosen.

firstk[, exclude] specifies that the first k observations from among those to be clustered be used as the starting centers for the k groups. With the exclude option, these first k observations are not included among the observations to be clustered.

lastk[, exclude] specifies that the last k observations from among those to be clustered be used as the starting centers for the k groups. With the exclude option, these last k observations are then not included among the observations to be clustered.

random[(seed#)] specifies that k random initial group centers be generated. The values are randomly chosen from a uniform distribution over the range of the data. Optionally, a random-number seed may be specified to cause the command set seed seed# (see [R] set seed) to be applied before the k group centers are generated.

prandom[(seed#)] specifies that k partitions be formed randomly among the observations to be clustered. The group means or medians from the k groups defined by this partitioning are to be used as the starting group centers. Optionally, a random-number seed may be specified to cause the command set seed seed# (see [R] set seed) to be applied before the k partitions are chosen.

everykth specifies that k partitions be formed by assigning observations 1, 1+k, 1+2k, ... to the first group; assigning observations 2, 2+k, 2+2k, ... to the second group; and so on, to form k groups. The group means or medians from these k groups are to be used as the starting group centers.

segments specifies that k nearly equal partitions be formed from the data. Approximately the first N/k observations are assigned to the first group, the second N/k observations are assigned to the second group, and so on. The group means or medians from these k groups are to be used as the starting group centers.

group(varname) provides an initial grouping variable, varname, that defines k groups among the observations to be clustered. The group means or medians from these k groups are to be used as the starting group centers.

keepcenters specifies that the group means or medians from the k groups that are produced are to be appended to the data.

+----------+ ----+ Advanced +---------------------------------------------------------

generate(groupvar) provides the name of the grouping variable to be created by cluster kmeans or cluster kmedians. By default, this will be the name specified in name().

iterate(#) specifies the maximum number of iterations to allow in the kmeans or kmedians clustering algorithm. The default is iterate(10000).

Examples

Setup . webuse labtech

Perform kmeans cluster analysis, creating eight groups . cluster kmeans x1 x2 x3 x4, k(8)

Same as above, but using absolute-value distance instead of Euclidian distance, naming cluster analysis k8abs . cluster kmeans x1 x2 x3 x4, k(8) measure(L1) name(k8abs)

Perform kmedians cluster analysis, creating six groups by using the Canberra distance metric . cluster kmedians x1 x2 x3 x4, k(6) measure(Canberra)

Create six groups, using the first 6 observations in the dataset as starting centers . cluster kmedians x1 x2 x3 x4, k(6) start(firstk)

Same as above, but do not include the first 6 observations in the cluster analysis . cluster kmedians x1 x2 x3 x4, k(6) start(firstk, exclude)


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index