Stata 15 help for cluster dendrogram

[MV] cluster dendrogram -- Dendrograms for hierarchical cluster analysis

Syntax

cluster dendrogram [clname] [if] [in] [, options ]

options Description ------------------------------------------------------------------------- Main quick do not center parent branches labels(varname) name of variable containing leaf labels cutnumber(#) display top # branches only cutvalue(#) display branches above # (dis)similarity measure only showcount display number of observations for each branch countprefix(string) prefix the branch count with string; default is ``n='' countsuffix(string) suffix the branch count with string; default is empty string countinline put branch count in line with branch label vertical orient dendrogram vertically (default) horizontal orient dendrogram horizontally

Plot line_options affect rendition of the plotted lines

Add plots addplot(plot) add other plots to the dendrogram

Y axis, X axis, Titles, Legend, Overall twoway_options any option other than by() documented in [G-3] twoway_options -------------------------------------------------------------------------

Note: cluster tree is a synonym for cluster dendrogram.

In addition to the restrictions imposed by if and in, the observations are automatically restricted to those that were used in the cluster analysis.

Menu

Statistics > Multivariate analysis > Cluster analysis > Postclustering > Dendrograms

Description

cluster dendrogram produces dendrograms (also called cluster trees) for a hierarchical clustering. See [MV] cluster for a list of the available cluster commands.

Dendrograms graphically present the information concerning which observations are grouped together at various levels of (dis)similarity. At the bottom of the dendrogram, each observation is considered its own cluster. Vertical lines extend up for each observation, and at various (dis)similarity values, these lines are connected to the lines from other observations with a horizontal line. The observations continue to combine until, at the top of the dendrogram, all observations are grouped together.

The height of the vertical lines and the range of the (dis)similarity axis give visual clues about the strength of the clustering. Long vertical lines indicate more distinct separation between the groups. Long vertical lines at the top of the dendrogram indicate that the groups represented by those lines are well separated from one another. Shorter lines indicate groups that are not as distinct.

Options

+------+ ----+ Main +-------------------------------------------------------------

quick switches to a different style of dendrogram in which the vertical lines go straight up from the observations instead of the default action of being recentered after each merge of observations in the dendrogram hierarchy. Some people prefer this representation, and it is quicker to render.

labels(varname) specifies that varname is to be used in place of observation numbers for labeling the observations at the bottom of the dendrogram.

cutnumber(#) displays only the top # branches of the dendrogram. With large dendrograms, the lower levels of the tree can become too crowded. With cutnumber(), you can limit your view to the upper portion of the dendrogram. Also see the cutvalue() option.

cutvalue(#) displays only those branches of the dendrogram that are above the # (dis)similarity measure. With large dendrograms, the lower levels of the tree can become too crowded. With cutvalue(), you can limit your view to the upper portion of the dendrogram. Also see the cutnumber() option.

showcount requests that the number of observations associated with each branch be displayed below the branches. showcount is most useful with cutnumber() and cutvalue() because, otherwise, the number of observations for each branch is one. When this option is specified, a label for each branch is constructed by using a prefix string, the branch count, and a suffix string.

countprefix(string) specifies the prefix string for the branch count label. The default is countprefix(n=). This option implies the use of the showcount option.

countsuffix(string) specifies the suffix string for the branch count label. The default is an empty string. This option implies the use of the showcount option.

countinline requests that the branch count be put in line with the corresponding branch label. The branch count is placed below the branch label by default. This option implies the use of the showcount option.

vertical and horizontal specify whether the x and y coordinates are to be swapped before plotting -- vertical (the default) does not swap the coordinates, whereas horizontal does.

+------+ ----+ Plot +-------------------------------------------------------------

line_options affect the rendition of the lines; see [G-3] line_options.

+-----------+ ----+ Add plots +--------------------------------------------------------

addplot(plot) allows adding more graph twoway plots to the graph; see [G-3] addplot_option.

+-----------------------------------------+ ----+ Y axis, X axis, Titles, Legend, Overall +--------------------------

twoway_options are any of the options documented in [G-3] twoway_options, excluding by(). These include options for titling the graph (see [G-3] title_options) and for saving the graph to disk (see [G-3] saving_option).

Examples

Setup . webuse labtech . cluster completelinkage x1 x2 x3 x4, name(L2clnk) . cluster generate g3 = group(3)

Draw dendrograms . cluster dendrogram L2clnk, horizontal labels(labt) . cluster dendrogram L2clnk, labels(labt) quick

Tree is a synonym for dendrogram; show only top 5 branches . cluster tree if g3==3, showcount

Show only branches with dissimilarity greater than 75.3 . cluster dendrogram, cutvalue(75.3) . cluster tree, cutvalue(75.3) showcount countinline


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index