help mdslong dialog: mdslong
also see: mds postestimation
-------------------------------------------------------------------------------
Title
[MV] mdslong -- Multidimensional scaling of proximity data in long format
Syntax
mdslong depvar [if] [in] [weight] , id(var1 var2) [ options ]
options description
-------------------------------------------------------------------------
Model
* id(var1 var2) identify comparison pairs (object1,object2)
method(method) method for performing MDS
loss(loss) loss function
transform(tfunction) permitted transformations of dissimilarities
normalize(norm) normalization method; default is
normalize(principal)
s2d(standard) convert similarity to dissimilarity:
dissim(ij) =
sqrt(sim(ii)+sim(jj)-2sim(ij)); the default
s2d(oneminus) convert similarity to dissimilarity:
dissim(ij) = 1-sim(ij)
force correct problems in proximity information
dimension(#) configuration dimensions; default is
dimension(2)
addconstant make distance matrix positive semidefinite
(classical MDS only)
Reporting
neigen(#) maximum number of eigenvalues to display;
default is neigen(10) (classical MDS only)
config display table with configuration coordinates
noplot suppress configuration plot
Minimization
initialize(initopt) start with configuration given in initopt
tolerance(#) tolerance for configuration matrix; default
is tolerance(1e-4)
ltolerance(#) tolerance for loss criterion; default is
ltolerance(1e-8)
iterate(#) perform maximum of # iterations; default is
iterate(1000)
protect(#) perform # optimizations and report best
solution; default is protect(1)
nolog suppress the iteration log
trace display current configuration in iteration
log
gradient display current gradient matrix in iteration
log
+ sdprotect(#) advanced; see description below
-------------------------------------------------------------------------
* id(var1 var2) is required.
+ sdprotect(#) does not appear in the dialog box.
by and statsby are allowed; see prefix.
aweights and fweights are allowed; see weights.
The maximum number of compared objects allowed is the maximum matrix
size; see [R] matsize.
See [MV] mds postestimation for features available after estimation.
method description
-------------------------------------------------------------------------
classical classical MDS; default if neither loss() nor transform()
is specified
modern modern MDS; default if loss() or transform() is
specified; except when loss(stress) and
transform(monotonic) are specified
nonmetric nonmetric (modern) MDS; default when loss(stress) and
transform(monotonic) are specified
-------------------------------------------------------------------------
loss description
-------------------------------------------------------------------------
stress stress criterion, normalized by distances; the default
nstress stress criterion, normalized by disparities
sstress squared stress criterion, normalized by distances
nsstress squared stress criterion, normalized by disparities
strain strain criterion (with transform(identity) is equivalent
to classical MDS)
sammon Sammon mapping
-------------------------------------------------------------------------
tfunction description
-------------------------------------------------------------------------
identity no transformation; disparity = dissimilarity; the
default
power power alpha: disparity = dissimilarity^alpha
monotonic weakly monotonic increasing functions (nonmetric
scaling); only with loss(stress)
-------------------------------------------------------------------------
norm description
-------------------------------------------------------------------------
principal principal orientation; location=0; the
default
classical Procrustes rotation toward classical solution
target(matname)[, copy] Procrustes rotation toward matname; ignore
naming conflicts if copy is specified
-------------------------------------------------------------------------
initopt description
-------------------------------------------------------------------------
classical start with classical solution; the default
random[(#)] start at random configuration, setting seed
to #
from(matname)[, copy] start from matname; ignore naming conflicts
if copy is specified
-------------------------------------------------------------------------
Menu
Statistics > Multivariate analysis > Multidimensional scaling (MDS) > MDS
of proximity-pair data
Description
mdslong performs multidimensional scaling (MDS) for two-way proximity
data in long format with an explicit measure of similarity or
dissimilarity between objects. mdslong performs classical MDS (Torgerson
1952) as well as modern metric and nonmetric MDS; see the method(),
loss(), and transform() options.
For MDS with two-way proximity data in a matrix, see [MV] mdsmat. If you
are looking for MDS on a dataset, based on dissimilarities between
observations over variables, see [MV] mds.
Computing the classical solution is straightforward, but with modern MDS
the minimization of the loss criteria over configurations is a
high-dimensional problem that is easily beset by convergence to local
minimums. mds, mdsmat, and mdslong provide options to control the
minimization process (1) by allowing the user to select the starting
configuration and (2) by selecting the best solution among multiple
minimization runs from random starting configurations.
Options
+-------+
----+ Model +------------------------------------------------------------
id(var1 var2) is required. The pair of variables var1 and var2 should
uniquely identify comparisons. var1 and var2 are string or numeric
variables that identify the objects to be compared. var1 and var2
should be of the same data type; if they are value labeled, they
should be labeled with the same value label. Using value-labeled
variables or string variables is generally helpful in identifying the
points in plots and tables.
Example data layout for mdslong proxim, id(i1 i2).
proxim i1 i2
------------------
7 1 2
10 1 3
12 1 4
4 2 3
6 2 4
3 3 4
------------------
If you have multiple measurements per pair, we suggest that you
specify the mean of the measures as the proximity and the inverse of
the variance as the weight.
method(method) specifies the method for MDS.
method(classical) specifies classical metric scaling, also known as
"principal coordinates analysis" when used with Euclidean
proximities. Classical MDS obtains equivalent results to modern
MDS with loss(strain) and transform(identity) without weights.
The calculations for classical MDS are fast; consequently,
classical MDS is generally used to obtain starting values for
modern MDS. If the options loss() and transform() are not
specified, mds computes the classical solution, likewise if
method(classical) is specified loss() and transform() are not
allowed.
method(modern) specifies modern scaling. If method(modern) is
specified but not loss() or transform(), then loss(stress) and
transform(identity) are assumed. All values of loss() and
transform() are valid with method(modern).
method(nonmetric) specifies nonmetric scaling, which is a type of
modern scaling. If method(nonmetric) is specified, loss(stress)
and transform(monotonic) are assumed. Other values of loss() and
transform() are not allowed.
loss(loss) specifies the loss criterion.
loss(stress) specifies that the stress loss function be used,
normalized by the squared Euclidean distances. This criterion is
often called Kruskal's stress-1. Optimal configurations for
loss(stress) and for loss(nstress) are equivalent up to a scale
factor, but the iteration paths may differ. loss(stress) is the
default.
loss(nstress) specifies that the stress loss function be used,
normalized by the squared disparities, i.e., transformed
dissimilarities. Optimal configurations for loss(stress) and for
loss(nstress) are equivalent up to a scale factor, but the
iteration paths may differ.
loss(sstress) specifies that the squared stress loss function be
used, normalized by the fourth power of the Euclidean distances.
loss(nsstress) specifies that the squared stress criterion,
normalized by the fourth power of the disparities (transformed
dissimilarities) be used.
loss(strain) specifies the strain loss criterion. Classical scaling
is equivalent to loss(strain) and transform(identity) but is
computed by a faster noniterative algorithm. Specifying
loss(strain) still allows transformations.
loss(sammon) specifies the Sammon (1969) loss criterion.
transform(tfunction) specifies the class of allowed transformations of
the dissimilarities; transformed dissimilarities are called
disparities.
transform(identity) specifies that the only allowed transformation is
the identity; i.e., disparities are equal to dissimilarities.
transform(identity) is the default.
transform(power) specifies that disparities are related to the
dissimilarities by a power function,
disparity = dissimilarity^alpha, alpha>0
transform(monotonic) specifies that the disparities are a weakly
monotonic function of the dissimilarities. This is also known as
nonmetric MDS. Tied dissimilarities are handled by the primary
method; i.e., ties may be broken but are not necessarily broken.
transform(monotonic) is valid only with loss(stress).
normalize(norm) specifies a normalization method for the configuration.
Recall that the location and orientation of an MDS configuration is
not defined ("identified"); an isometric transformation (i.e.,
translation, reflection, or orthonormal rotation) of a configuration
preserves interpoint Euclidean distances.
normalize(principal) performs a principal normalization, in which the
configuration columns have zero mean and correspond to the
principal components, with positive coefficient for the
observation with lowest value of id(). normalize(principal) is
the default.
normalize(classical) normalizes by a distance-preserving Procrustean
transformation of the configuration toward the classical
configuration in principal normalization; see [MV] procrustes.
normalize(classical) is not valid if method(classical) is
specified.
normalize(target(matname) [, copy]) normalizes by a
distance-preserving Procrustean transformation toward matname;
see [MV] procrustes. matname should be an n x p matrix, where n
is the number of observations and p is the number of dimensions,
and the rows of matname should be ordered with respect to id().
The rownames of matname should be set correctly but will be
ignored if copy is also specified.
Note on normalize(classical) and normalize(target()): the Procrustes
transformation comprises any combination of translation, reflection,
and orthonormal rotation -- these transformations preserve distance.
Dilation (uniform scaling) would stretch distances and is not
applied. However, the output reports the dilation factor, and the
reported Procrustes statistic is for the dilated configuration.
s2d(standard|oneminus) specifies how similarities are converted into
dissimilarities. By default, the command assumes dissimilarity data.
Specifying s2d() indicates that your proximity data are similarities.
Dissimilarity data should have zeros on the diagonal (i.e., an object
is identical to itself) and nonnegative off-diagonal values.
Dissimilarities need not satisfy the triangular inequality, D(i,j)^2
< D(i,h)^2 + D(h,j)^2. Similarity data should have ones on the
diagonal (i.e., an object is identical to itself) and have
off-diagonal values between zero and one. In either case,
proximities should be symmetric. See option force if your data
violate these assumptions.
The available s2d() options, standard and oneminus, are defined as
follows:
standard dissim(ij) = sqrt(sim(ii)+sim(jj)-2sim(ij)) =
sqrt(2(1-sim(ij)))
oneminus dissim(ij) = 1-sim(ij)
s2d(standard) is the default.
s2d() should be specified only with measures in similarity form.
force corrects problems with the supplied proximity information. In the
long format used by mdslong, multiple measurements on (i,j) may be
available. Including both (i,j) and (j,i) would be treated as
multiple measurements. This is an error, even if the measures are
identical. Option force uses the mean of the measurements. force
also resolves problems on the diagonal, i.e., comparisons of objects
with themselves; these should have zero dissimilarity or unit
similarity. force does not resolve incomplete data, i.e., pairs
(i,j) for which no measurement is available. Out-of-range values are
also not fixed.
dimension(#) specifies the dimension of the approximating configuration.
The default # is 2 and should not exceed the number of positive
eigenvalues of the centered distance matrix.
addconstant specifies that if the double-centered distance matrix is not
positive semidefinite (psd), a constant should be added to the
squared distances to make it psd, and, hence, Euclidean. This option
is allowed with classical MDS only.
+-----------+
----+ Reporting +--------------------------------------------------------
neigen(#) specifies the number of eigenvalues to be included in the
table. The default is neigen(10). Specifying neigen(0) suppresses
the table. This option is allowed with classical MDS only.
config displays the table with the coordinates of the approximating
configuration. This table may also be displayed using the
postestimation command estat config; see [MV] mds postestimation.
noplot suppresses the graph of the approximating configuration. The
graph can still be produced later via mdsconfig, which also allows
the standard graphics options for fine-tuning the plot; see [MV] mds
postestimation.
+--------------+
----+ Minimization +-----------------------------------------------------
These options are available only with method(modern) or
method(nonmetric):
initialize(initopt) specifies the initial values of the criterion
minimization process.
initialize(classical), the default, uses the solution from classical
metric scaling as initial values. With protect(), all but the
first run start from random perturbations from the classical
solution. These random perturbations are independent and
normally distributed with standard error equal to the product of
sdprotect(#) and the standard deviation of the dissimilarities.
initialize(classical) is the default.
initialize(random) starts an optimization process from a random
starting configuration. These random configurations are
generated from independent normal distributions with standard
error equal to the product of sdprotect(#) and the standard
deviation of the dissimilarities. The means of the configuration
are irrelevant in MDS.
initialize(from(matname) [, copy]) sets the initial value to matname.
matname should be an n x p matrix, where n is the number of
observations and p is the number of dimensions, and the rows of
matname should be ordered with respect to id(). The rownames of
matname should be set correctly but will be ignored if copy is
specified. With protect(), the second-to-last runs start from
random perturbations from matname. These random perturbations
are independent normal distributed with standard error equal to
the product of sdprotect(#) and the standard deviation of the
dissimilarities.
tolerance(#) specifies the tolerance for the configuration matrix. When
the relative change in the configuration from one iteration to the
next is less than or equal to tolerance(), the tolerance()
convergence criterion is satisfied. The default is tolerance(1e-4).
ltolerance(#) specifies the tolerance for the fit criterion. When the
relative change in the fit criterion from one iteration to the next
is less than or equal to ltolerance(), the ltolerance() convergence
is satisfied. The default is ltolerance(1e-8).
Both the tolerance() and ltolerance() criteria must be satisfied for
convergence.
iterate(#) specifies the maximum number of iterations. The default is
iterate(1000).
protect(#) requests that # optimizations be performed and that the best
of the solutions be reported. The default is protect(1). See option
initialize() on starting values of the runs. The output contains a
table of the return code, the criterion value reached, and the seed
of the random number used to generate the starting value. Specifying
a large number, such as protect(50), provides reasonable insight
whether the solution found is a global minimum and not just a local
minimum.
If any of the options log, trace, or gradient is also specified,
iteration reports will be printed for each optimization run. Beware:
this option will produce a lot of output.
nolog suppresses the iteration log, showing the progress of the
minimization process.
trace displays the configuration matrices in the iteration report.
Beware: this option may produce a lot of output.
gradient displays the gradient matrices of the fit criterion in the
iteration report. Beware: this option may produce a lot of output.
The following option is available with mdslong but is not shown in the
dialog box:
sdprotect(#) sets a proportionality constant for the standard deviations
of random configurations (init(random)) or random perturbations of
given starting configurations (init(classical) or init(from())). The
default is sdprotect(1).
Remarks
The purpose of multidimensional scaling (MDS) is to produce a
representation of a dissimilarity relation between a set of n objects by
Euclidean distances between a constructed configuration of points in a
low-dimensional Euclidean space, typically two-dimensional. If this
low-dimensional representation offers a good enough approximation, we may
plot the points in this low-dimensional space and interpret the Euclidean
distance between the points as the dissimilarity between the original
objects. Points mapped close together are similar; points mapped widely
apart are dissimilar.
depvar specifies proximity data in either dissimilarity or similarity
form. The comparison pairs are identified by two variables specified in
the required option id(). Exactly 1 observation with a nonmissing depvar
should be included for each pair (i,j). Pairs are unordered; you do not
include observations for both (i,j) and (j,i). Observations for
comparisons of objects with themselves (i,i) are optional. See option
force if your data violate these assumptions.
When you have multiple independent measures of the dissimilarities, you
may specify the mean of these dissimilarities as the combined measure and
specify 1/#measures or 1/variance(measures) as weights. The weights
should be irreducible; i.e., it is not possible to split the objects into
disjoint groups with all intergroup weights 0.
Example
A famous example in the MDS literature is the data on the percentage of
times that pairs of Morse code signals for two numbers (1,..,9,0) were
declared the same by 598 subjects. We use the Morse data in long format.
The entries are in the order 1,2,...,9,0.
. webuse morse_long
The proximity of (2,1) is entered, but not (1,2). Either one may be
entered; it does not matter which. Proximities between the same objects,
e.g., (2,2) are not entered. First we generate a similarity measure
between the objects.
. gen sim = freqsame/100
Classical MDS
. mdslong sim, id(digit1 digit2) s2d(standard)
Modern MDS
. mdslong sim, id(digit1 digit2) method(modern)
(note: loss(stress) and transform(identity) are assumed)
Nonmetric MDS
. mdslong sim, id(digit1 digit2) method(nonmetric)
(note: loss(stress) and transform(monotonic) are assumed)
Saved results
mdslong saves the following in e():
Scalars
e(N) number of underlying observations
e(p) number of dimensions in the approximating
configuration
e(np) number of strictly positive eigenvalues
e(addcons) constant added to squared dissimilarities to force
positive semidefiniteness
e(mardia1) Mardia measure 1
e(mardia2) Mardia measure 2
e(critval) Loss criterion value
e(npos) number of pairs with positive weights
e(wsum) sum of weights
e(alpha) parameter of transform(power)
e(ic) iteration count
e(rc) return code
e(converged) 1 if converged, 0 otherwise
Macros
e(cmd) mdslong
e(cmdline) command as typed
e(method) classical or modern MDS method
e(method2) nonmetric if method(nonmetric)
e(loss) loss criterion
e(losstitle) description loss criterion
e(tfunction) identity, power, or monotonic, transformation
function(C-)
e(transftitle) description of transformation
e(id) two ID variable names identifying compared object
pairs
e(idtype) int or str; type of id() variable(s)
e(duplicates) 1 if duplicates in id(), 0 otherwise
e(labels) labels for ID categories
e(mxlen) maximum length of category labels
e(depvar) dependent variable containing dissimilarities
e(dtype) similarity or dissimilarity; type of proximity data
e(s2d) standard or oneminus (when e(dtype) is similarity)
e(wtype) weight type
e(wexp) weight expression
e(unique) 1 if eigenvalues are distinct, 0 otherwise
e(init) initialization method
e(iseed) seed for init(random)
e(seed) seed for solution
e(norm) normalization method
e(targetmatrix) name of target matrix for normalize(target)
e(properties) nob noV for modern or nonmetric MDS; nob noV eigen
for classical MDS
e(estat_cmd) program used to implement estat
e(predict) program used to implement predict
e(marginsnotok) predictions disallowed by margins
Matrices
e(D) dissimilarity matrix
e(Disparities) disparity matrix for nonmetric MDS
e(Y) approximating configuration coordinates
e(Ev) eigenvalues
e(W) weight matrix
e(idcoding) coding for integer identifier variable
e(norm_stats) normalization statistics
e(linearf) two element vector defining the linear
transformation; distance equals first element
plus second element times dissimilarity
Functions
e(sample) marks estimation sample
References
Sammon Jr., J. W. 1969. A nonlinear mapping for data structure analysis.
IEEE Transactions on Computers 18: 401-409.
Torgerson, W. S. 1952. Multidimensional scaling: I. Theory and method.
Psychometrika 17: 401-419.
Also see
Manual: [MV] mdslong
Help: [MV] mds postestimation;
[MV] mds, [MV] mdsmat; [MV] biplot, [MV] ca, [MV] factor, [MV]
pca