Stata 15 help for mdslong

[MV] mdslong -- Multidimensional scaling of proximity data in long format

Syntax

mdslong depvar [if] [in] [weight], id(var1 var2) [options]

options Description ------------------------------------------------------------------------- Model * id(var1 var2) identify comparison pairs (object1,object2) method(method) method for performing MDS loss(loss) loss function transform(tfunction) permitted transformations of dissimilarities normalize(norm) normalization method; default is normalize(principal) s2d(standard) convert similarity to dissimilarity: dissim(ij) = sqrt{sim(ii)+sim(jj)-2sim(ij)}; the default s2d(oneminus) convert similarity to dissimilarity: dissim(ij) = 1-sim(ij) force correct problems in proximity information dimension(#) configuration dimensions; default is dimension(2) addconstant make distance matrix positive semidefinite (classical MDS only)

Reporting neigen(#) maximum number of eigenvalues to display; default is neigen(10) (classical MDS only) config display table with configuration coordinates noplot suppress configuration plot

Minimization initialize(initopt) start with configuration given in initopt tolerance(#) tolerance for configuration matrix; default is tolerance(1e-4) ltolerance(#) tolerance for loss criterion; default is ltolerance(1e-8) iterate(#) perform maximum of # iterations; default is iterate(1000) protect(#) perform # optimizations and report best solution; default is protect(1) nolog suppress the iteration log trace display current configuration in iteration log gradient display current gradient matrix in iteration log

sdprotect(#) advanced; see Options below ------------------------------------------------------------------------- * id(var1 var2) is required. by and statsby are allowed; see prefix. aweights and fweights are allowed for methods modern and nonmetric; see weights. The maximum number of compared objects allowed is the maximum matrix size; see [R] matsize. sdprotect(#) does not appear in the dialog box. See [MV] mds postestimation for features available after estimation.

method Description ------------------------------------------------------------------------- classical classical MDS; default if neither loss() nor transform() is specified modern modern MDS; default if loss() or transform() is specified; except when loss(stress) and transform(monotonic) are specified nonmetric nonmetric (modern) MDS; default when loss(stress) and transform(monotonic) are specified -------------------------------------------------------------------------

loss Description ------------------------------------------------------------------------- stress stress criterion, normalized by distances; the default nstress stress criterion, normalized by disparities sstress squared stress criterion, normalized by distances nsstress squared stress criterion, normalized by disparities strain strain criterion (with transform(identity) is equivalent to classical MDS) sammon Sammon mapping -------------------------------------------------------------------------

tfunction Description ------------------------------------------------------------------------- identity no transformation; disparity = dissimilarity; the default power power alpha: disparity = dissimilarity^alpha monotonic weakly monotonic increasing functions (nonmetric scaling); only with loss(stress) -------------------------------------------------------------------------

norm Description ------------------------------------------------------------------------- principal principal orientation; location=0; the default classical Procrustes rotation toward classical solution target(matname)[, copy] Procrustes rotation toward matname; ignore naming conflicts if copy is specified -------------------------------------------------------------------------

initopt Description ------------------------------------------------------------------------- classical start with classical solution; the default random[(#)] start at random configuration, setting seed to # from(matname)[, copy] start from matname; ignore naming conflicts if copy is specified -------------------------------------------------------------------------

Menu

Statistics > Multivariate analysis > Multidimensional scaling (MDS) > MDS of proximity-pair data

Description

mdslong performs multidimensional scaling (MDS) for two-way proximity data in long format with an explicit measure of similarity or dissimilarity between objects. mdslong performs classical MDS as well as modern metric and nonmetric MDS.

For MDS with two-way proximity data in a matrix, see [MV] mdsmat. If you are looking for MDS on a dataset, based on dissimilarities between observations over variables, see [MV] mds.

Options

+-------+ ----+ Model +------------------------------------------------------------

id(var1 var2) is required. The pair of variables var1 and var2 should uniquely identify comparisons. var1 and var2 are string or numeric variables that identify the objects to be compared. var1 and var2 should be of the same data type; if they are value labeled, they should be labeled with the same value label. Using value-labeled variables or string variables is generally helpful in identifying the points in plots and tables.

Example data layout for mdslong proxim, id(i1 i2).

proxim i1 i2 ------------------ 7 1 2 10 1 3 12 1 4 4 2 3 6 2 4 3 3 4 ------------------

If you have multiple measurements per pair, we suggest that you specify the mean of the measures as the proximity and the inverse of the variance as the weight.

method(method) specifies the method for MDS.

method(classical) specifies classical metric scaling, also known as "principal coordinates analysis" when used with Euclidean proximities. Classical MDS obtains equivalent results to modern MDS with loss(strain) and transform(identity) without weights. The calculations for classical MDS are fast; consequently, classical MDS is generally used to obtain starting values for modern MDS. If the options loss() and transform() are not specified, mds computes the classical solution, likewise if method(classical) is specified loss() and transform() are not allowed.

method(modern) specifies modern scaling. If method(modern) is specified but not loss() or transform(), then loss(stress) and transform(identity) are assumed. All values of loss() and transform() are valid with method(modern).

method(nonmetric) specifies nonmetric scaling, which is a type of modern scaling. If method(nonmetric) is specified, loss(stress) and transform(monotonic) are assumed. Other values of loss() and transform() are not allowed.

loss(loss) specifies the loss criterion.

loss(stress) specifies that the stress loss function be used, normalized by the squared Euclidean distances. This criterion is often called Kruskal's stress-1. Optimal configurations for loss(stress) and for loss(nstress) are equivalent up to a scale factor, but the iteration paths may differ. loss(stress) is the default.

loss(nstress) specifies that the stress loss function be used, normalized by the squared disparities, that is, transformed dissimilarities. Optimal configurations for loss(stress) and for loss(nstress) are equivalent up to a scale factor, but the iteration paths may differ.

loss(sstress) specifies that the squared stress loss function be used, normalized by the fourth power of the Euclidean distances.

loss(nsstress) specifies that the squared stress criterion, normalized by the fourth power of the disparities (transformed dissimilarities) be used.

loss(strain) specifies the strain loss criterion. Classical scaling is equivalent to loss(strain) and transform(identity) but is computed by a faster noniterative algorithm. Specifying loss(strain) still allows transformations.

loss(sammon) specifies the Sammon (1969) loss criterion.

transform(tfunction) specifies the class of allowed transformations of the dissimilarities; transformed dissimilarities are called disparities.

transform(identity) specifies that the only allowed transformation is the identity; that is, disparities are equal to dissimilarities. transform(identity) is the default.

transform(power) specifies that disparities are related to the dissimilarities by a power function,

disparity = dissimilarity^alpha, alpha>0

transform(monotonic) specifies that the disparities are a weakly monotonic function of the dissimilarities. This is also known as nonmetric MDS. Tied dissimilarities are handled by the primary method; that is, ties may be broken but are not necessarily broken. transform(monotonic) is valid only with loss(stress).

normalize(norm) specifies a normalization method for the configuration. Recall that the location and orientation of an MDS configuration is not defined ("identified"); an isometric transformation (that is, translation, reflection, or orthonormal rotation) of a configuration preserves interpoint Euclidean distances.

normalize(principal) performs a principal normalization, in which the configuration columns have zero mean and correspond to the principal components, with positive coefficient for the observation with lowest value of id(). normalize(principal) is the default.

normalize(classical) normalizes by a distance-preserving Procrustean transformation of the configuration toward the classical configuration in principal normalization; see [MV] procrustes. normalize(classical) is not valid if method(classical) is specified.

normalize(target(matname) [, copy]) normalizes by a distance-preserving Procrustean transformation toward matname; see [MV] procrustes. matname should be an n x p matrix, where n is the number of observations and p is the number of dimensions, and the rows of matname should be ordered with respect to id(). The rownames of matname should be set correctly but will be ignored if copy is also specified.

Note on normalize(classical) and normalize(target()): the Procrustes transformation comprises any combination of translation, reflection, and orthonormal rotation -- these transformations preserve distance. Dilation (uniform scaling) would stretch distances and is not applied. However, the output reports the dilation factor, and the reported Procrustes statistic is for the dilated configuration.

s2d(standard|oneminus) specifies how similarities are converted into dissimilarities. By default, the command assumes dissimilarity data. Specifying s2d() indicates that your proximity data are similarities.

Dissimilarity data should have zeros on the diagonal (that is, an object is identical to itself) and nonnegative off-diagonal values. Dissimilarities need not satisfy the triangular inequality, D(i,j)^2 < D(i,h)^2 + D(h,j)^2. Similarity data should have ones on the diagonal (that is, an object is identical to itself) and have off-diagonal values between zero and one. In either case, proximities should be symmetric. See option force if your data violate these assumptions.

The available s2d() options, standard and oneminus, are defined as follows:

standard dissim(ij) = sqrt{sim(ii)+sim(jj)-2sim(ij)} = sqrt(2(1-sim(ij))) oneminus dissim(ij) = 1-sim(ij)

s2d(standard) is the default.

s2d() should be specified only with measures in similarity form.

force corrects problems with the supplied proximity information. In the long format used by mdslong, multiple measurements on (i,j) may be available. Including both (i,j) and (j,i) would be treated as multiple measurements. This is an error, even if the measures are identical. Option force uses the mean of the measurements. force also resolves problems on the diagonal, that is, comparisons of objects with themselves; these should have zero dissimilarity or unit similarity. force does not resolve incomplete data, that is, pairs (i,j) for which no measurement is available. Out-of-range values are also not fixed.

dimension(#) specifies the dimension of the approximating configuration. The default is dimension(2), and # should not exceed the number of positive eigenvalues of the centered distance matrix.

addconstant specifies that if the double-centered distance matrix is not positive semidefinite (psd), a constant should be added to the squared distances to make it psd and, hence, Euclidean. This option is allowed with classical MDS only.

+-----------+ ----+ Reporting +--------------------------------------------------------

neigen(#) specifies the number of eigenvalues to be included in the table. The default is neigen(10). Specifying neigen(0) suppresses the table. This option is allowed with classical MDS only.

config displays the table with the coordinates of the approximating configuration. This table may also be displayed using the postestimation command estat config; see [MV] mds postestimation.

noplot suppresses the graph of the approximating configuration. The graph can still be produced later via mdsconfig, which also allows the standard graphics options for fine-tuning the plot; see [MV] mds postestimation plots.

+--------------+ ----+ Minimization +-----------------------------------------------------

These options are available only with method(modern) or method(nonmetric):

initialize(initopt) specifies the initial values of the criterion minimization process.

initialize(classical), the default, uses the solution from classical metric scaling as initial values. With protect(), all but the first run start from random perturbations from the classical solution. These random perturbations are independent and normally distributed with standard error equal to the product of sdprotect(#) and the standard deviation of the dissimilarities. initialize(classical) is the default.

initialize(random) starts an optimization process from a random starting configuration. These random configurations are generated from independent normal distributions with standard error equal to the product of sdprotect(#) and the standard deviation of the dissimilarities. The means of the configuration are irrelevant in MDS.

initialize(from(matname)[, copy]) sets the initial value to matname. matname should be an n x p matrix, where n is the number of observations and p is the number of dimensions, and the rows of matname should be ordered with respect to id(). The rownames of matname should be set correctly but will be ignored if copy is specified. With protect(), the second-to-last runs start from random perturbations from matname. These random perturbations are independent normal distributed with standard error equal to the product of sdprotect(#) and the standard deviation of the dissimilarities.

tolerance(#) specifies the tolerance for the configuration matrix. When the relative change in the configuration from one iteration to the next is less than or equal to tolerance(), the tolerance() convergence criterion is satisfied. The default is tolerance(1e-4).

ltolerance(#) specifies the tolerance for the fit criterion. When the relative change in the fit criterion from one iteration to the next is less than or equal to ltolerance(), the ltolerance() convergence is satisfied. The default is ltolerance(1e-8).

Both the tolerance() and ltolerance() criteria must be satisfied for convergence.

iterate(#) specifies the maximum number of iterations. The default is iterate(1000).

protect(#) requests that # optimizations be performed and that the best of the solutions be reported. The default is protect(1). See option initialize() on starting values of the runs. The output contains a table of the return code, the criterion value reached, and the seed of the random number used to generate the starting value. Specifying a large number, such as protect(50), provides reasonable insight whether the solution found is a global minimum and not just a local minimum.

If any of the options log, trace, or gradient is also specified, iteration reports will be printed for each optimization run. Beware: this option will produce a lot of output.

nolog suppresses the iteration log, showing the progress of the minimization process.

trace displays the configuration matrices in the iteration report. Beware: this option may produce a lot of output.

gradient displays the gradient matrices of the fit criterion in the iteration report. Beware: this option may produce a lot of output.

The following option is available with mdslong but is not shown in the dialog box:

sdprotect(#) sets a proportionality constant for the standard deviations of random configurations (init(random)) or random perturbations of given starting configurations (init(classical) or init(from())). The default is sdprotect(1).

Example

A famous example in the MDS literature is the data on the percentage of times that pairs of Morse code signals for two numbers (1,..,9,0) were declared the same by 598 subjects. We use the Morse data in long format. The entries are in the order 1,2,...,9,0.

. webuse morse_long

The proximity of (2,1) is entered, but not (1,2). Either one may be entered; it does not matter which. Proximities between the same objects, for example, (2,2) are not entered. First we generate a similarity measure between the objects.

. gen sim = freqsame/100

Classical MDS

. mdslong sim, id(digit1 digit2) s2d(standard)

Modern MDS

. mdslong sim, id(digit1 digit2) method(modern) (note: loss(stress) and transform(identity) are assumed)

Nonmetric MDS

. mdslong sim, id(digit1 digit2) method(nonmetric) (note: loss(stress) and transform(monotonic) are assumed)

Stored results

mdslong stores the following in e():

Scalars e(N) number of underlying observations e(p) number of dimensions in the approximating configuration e(np) number of strictly positive eigenvalues e(addcons) constant added to squared dissimilarities to force positive semidefiniteness e(mardia1) Mardia measure 1 e(mardia2) Mardia measure 2 e(critval) loss criterion value e(npos) number of pairs with positive weights e(wsum) sum of weights e(alpha) parameter of transform(power) e(ic) iteration count e(rc) return code e(converged) 1 if converged, 0 otherwise

Macros e(cmd) mdslong e(cmdline) command as typed e(method) classical or modern MDS method e(method2) nonmetric, if method(nonmetric) e(loss) loss criterion e(losstitle) description loss criterion e(tfunction) identity, power, or monotonic, transformation function e(transftitle) description of transformation e(id) two ID variable names identifying compared object pairs e(idtype) int or str; type of id() variable e(duplicates) 1 if duplicates in id(), 0 otherwise e(labels) labels for ID categories e(mxlen) maximum length of category labels e(depvar) dependent variable containing dissimilarities e(dtype) similarity or dissimilarity; type of proximity data e(s2d) standard or oneminus (when e(dtype) is similarity) e(wtype) weight type e(wexp) weight expression e(unique) 1 if eigenvalues are distinct, 0 otherwise e(init) initialization method e(irngstate) initial random-number state used for init(random) e(rngstate) random-number state for solution e(norm) normalization method e(targetmatrix) name of target matrix for normalize(target) e(properties) nob noV for modern or nonmetric MDS; nob noV eigen for classical MDS e(estat_cmd) program used to implement estat e(predict) program used to implement predict e(marginsnotok) predictions disallowed by margins

Matrices e(D) dissimilarity matrix e(Disparities) disparity matrix for nonmetric MDS e(Y) approximating configuration coordinates e(Ev) eigenvalues e(W) weight matrix e(idcoding) coding for integer identifier variable e(norm_stats) normalization statistics e(linearf) two element vector defining the linear transformation; distance equals first element plus second element times dissimilarity

Functions e(sample) marks estimation sample

Reference

Sammon, J. W., Jr. 1969. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18: 401-409.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index