**[MV] mdslong** -- Multidimensional scaling of proximity data in long format

__Syntax__

**mdslong** *depvar* [*if*] [*in*] [*weight*]**,** **id(***var1 var2***)** [*options*]

*options* Description
-------------------------------------------------------------------------
Model
* **id(***var1 var2***)** identify comparison pairs (object1,object2)
__met__**hod(***method***)** method for performing MDS
**loss(***loss***)** loss function
__trans__**form(***tfunction***)** permitted transformations of dissimilarities
__norm__**alize(***norm***)** normalization method; default is
**normalize(principal)**
**s2d(**__st__**andard)** convert similarity to dissimilarity:
dissim(ij) =
sqrt{sim(ii)+sim(jj)-2sim(ij)}; the default
**s2d(**__one__**minus)** convert similarity to dissimilarity:
dissim(ij) = 1-sim(ij)
**force** correct problems in proximity information
__dim__**ension(***#***)** configuration dimensions; default is
**dimension(2)**
__add__**constant** make distance matrix positive semidefinite
(classical MDS only)

Reporting
__neig__**en(***#***)** maximum number of eigenvalues to display;
default is **neigen(10)** (classical MDS only)
__con__**fig** display table with configuration coordinates
__nopl__**ot** suppress configuration plot

Minimization
__init__**ialize(***initopt***)** start with configuration given in *initopt*
__tol__**erance(***#***)** tolerance for configuration matrix; default
is **tolerance(1e-4)**
__ltol__**erance(***#***)** tolerance for loss criterion; default is
**ltolerance(1e-8)**
__iter__**ate(***#***)** perform maximum of *#* iterations; default is
**iterate(1000)**
__prot__**ect(***#***)** perform *#* optimizations and report best
solution; default is **protect(1)**
__nolo__**g** suppress the iteration log
__tr__**ace** display current configuration in iteration
log
__grad__**ient** display current gradient matrix in iteration
log

__sd__**protect(***#***)** advanced; see *Options* below
-------------------------------------------------------------------------
* **id(***var1 var2***)** is required.
**by** and **statsby** are allowed; see prefix.
**aweight**s and **fweight**s are allowed for methods **modern** and **nonmetric**; see
weights.
The maximum number of compared objects allowed is the maximum matrix
size; see **[R] matsize**.
**sdprotect(***#***)** does not appear in the dialog box.
See **[MV] mds postestimation** for features available after estimation.

*method* Description
-------------------------------------------------------------------------
__c__**lassical** classical MDS; default if neither **loss()** nor **transform()**
is specified
__m__**odern** modern MDS; default if **loss()** or **transform()** is
specified; except when **loss(stress)** and
**transform(monotonic)** are specified
__n__**onmetric** nonmetric (modern) MDS; default when **loss(stress)** and
**transform(monotonic)** are specified
-------------------------------------------------------------------------

*loss* Description
-------------------------------------------------------------------------
__stre__**ss** stress criterion, normalized by distances; the default
__nstr__**ess** stress criterion, normalized by disparities
__sstr__**ess** squared stress criterion, normalized by distances
__nsst__**ress** squared stress criterion, normalized by disparities
__stra__**in** strain criterion (with **transform(identity)** is equivalent
to classical MDS)
__sam__**mon ** Sammon mapping
-------------------------------------------------------------------------

*tfunction* Description
-------------------------------------------------------------------------
__i__**dentity** no transformation; disparity = dissimilarity; the
default
__p__**ower** power alpha: disparity = dissimilarity^alpha
__m__**onotonic** weakly monotonic increasing functions (nonmetric
scaling); only with **loss(stress)**
-------------------------------------------------------------------------

*norm* Description
-------------------------------------------------------------------------
__p__**rincipal** principal orientation; location=0; the
default
__c__**lassical** Procrustes rotation toward classical solution
__t__**arget(***matname***)**[**, copy**] Procrustes rotation toward *matname*; ignore
naming conflicts if **copy** is specified
-------------------------------------------------------------------------

*initopt* Description
-------------------------------------------------------------------------
__c__**lassical** start with classical solution; the default
__r__**andom**[**(***#***)**] start at random configuration, setting seed
to *#*
__f__**rom(***matname***)**[**, copy**] start from *matname*; ignore naming conflicts
if **copy** is specified
-------------------------------------------------------------------------

__Menu__

**Statistics > Multivariate analysis > Multidimensional scaling (MDS) >** **MDS**
**of proximity-pair data**

__Description__

**mdslong** performs multidimensional scaling (MDS) for two-way proximity
data in long format with an explicit measure of similarity or
dissimilarity between objects. **mdslong** performs classical MDS as well as
modern metric and nonmetric MDS.

For MDS with two-way proximity data in a matrix, see **[MV] mdsmat**. If you
are looking for MDS on a dataset, based on dissimilarities between
observations over variables, see **[MV] mds**.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**id(***var1 var2***)** is required. The pair of variables *var1* and *var2* should
uniquely identify comparisons. *var1* and *var2* are string or numeric
variables that identify the objects to be compared. *var1* and *var2*
should be of the same data type; if they are value labeled, they
should be labeled with the same value label. Using value-labeled
variables or string variables is generally helpful in identifying the
points in plots and tables.

Example data layout for **mdslong proxim, id(i1 i2)**.

**proxim i1 i2**
------------------
7 1 2
10 1 3
12 1 4
4 2 3
6 2 4
3 3 4
------------------

If you have multiple measurements per pair, we suggest that you
specify the mean of the measures as the proximity and the inverse of
the variance as the weight.

**method(***method***)** specifies the method for MDS.

**method(classical)** specifies classical metric scaling, also known as
"principal coordinates analysis" when used with Euclidean
proximities. Classical MDS obtains equivalent results to modern
MDS with **loss(strain)** and **transform(identity)** without weights.
The calculations for classical MDS are fast; consequently,
classical MDS is generally used to obtain starting values for
modern MDS. If the options **loss()** and **transform()** are not
specified, **mds** computes the classical solution, likewise if
**method(classical)** is specified **loss()** and **transform()** are not
allowed.

**method(modern)** specifies modern scaling. If **method(modern)** is
specified but not **loss()** or **transform()**, then **loss(stress)** and
**transform(identity)** are assumed. All values of **loss()** and
**transform()** are valid with **method(modern)**.

**method(nonmetric)** specifies nonmetric scaling, which is a type of
modern scaling. If **method(nonmetric)** is specified, **loss(stress)**
and **transform(monotonic)** are assumed. Other values of **loss()** and
**transform()** are not allowed.

**loss(***loss***)** specifies the loss criterion.

**loss(stress)** specifies that the stress loss function be used,
normalized by the squared Euclidean distances. This criterion is
often called Kruskal's stress-1. Optimal configurations for
**loss(stress)** and for **loss(nstress)** are equivalent up to a scale
factor, but the iteration paths may differ. **loss(stress)** is the
default.

**loss(nstress)** specifies that the stress loss function be used,
normalized by the squared disparities, that is, transformed
dissimilarities. Optimal configurations for **loss(stress)** and for
**loss(nstress)** are equivalent up to a scale factor, but the
iteration paths may differ.

**loss(sstress)** specifies that the squared stress loss function be
used, normalized by the fourth power of the Euclidean distances.

**loss(nsstress)** specifies that the squared stress criterion,
normalized by the fourth power of the disparities (transformed
dissimilarities) be used.

**loss(strain)** specifies the strain loss criterion. Classical scaling
is equivalent to **loss(strain)** and **transform(identity)** but is
computed by a faster noniterative algorithm. Specifying
**loss(strain)** still allows transformations.

**loss(sammon)** specifies the Sammon (1969) loss criterion.

**transform(***tfunction***)** specifies the class of allowed transformations of
the dissimilarities; transformed dissimilarities are called
disparities.

**transform(identity)** specifies that the only allowed transformation is
the identity; that is, disparities are equal to dissimilarities.
**transform(identity)** is the default.

**transform(power)** specifies that disparities are related to the
dissimilarities by a power function,

disparity = dissimilarity^alpha, alpha>0

**transform(monotonic)** specifies that the disparities are a weakly
monotonic function of the dissimilarities. This is also known as
nonmetric MDS. Tied dissimilarities are handled by the primary
method; that is, ties may be broken but are not necessarily
broken. **transform(monotonic)** is valid only with **loss(stress)**.

**normalize(***norm***)** specifies a normalization method for the configuration.
Recall that the location and orientation of an MDS configuration is
not defined ("identified"); an isometric transformation (that is,
translation, reflection, or orthonormal rotation) of a configuration
preserves interpoint Euclidean distances.

**normalize(principal)** performs a principal normalization, in which the
configuration columns have zero mean and correspond to the
principal components, with positive coefficient for the
observation with lowest value of **id()**. **normalize(principal)** is
the default.

**normalize(classical)** normalizes by a distance-preserving Procrustean
transformation of the configuration toward the classical
configuration in principal normalization; see **[MV] procrustes**.
**normalize(classical)** is not valid if** method(classical)** is
specified.

**normalize(target(***matname***)** [**, copy**]**)** normalizes by a
distance-preserving Procrustean transformation toward *matname*;
see **[MV] procrustes**. *matname* should be an *n* x *p* matrix, where *n*
is the number of observations and *p* is the number of dimensions,
and the rows of *matname* should be ordered with respect to **id()**.
The rownames of *matname* should be set correctly but will be
ignored if **copy** is also specified.

Note on **normalize(classical)** and **normalize(target())**: the Procrustes
transformation comprises any combination of translation, reflection,
and orthonormal rotation -- these transformations preserve distance.
Dilation (uniform scaling) would stretch distances and is not
applied. However, the output reports the dilation factor, and the
reported Procrustes statistic is for the dilated configuration.

**s2d(standard**|**oneminus)** specifies how similarities are converted into
dissimilarities. By default, the command assumes dissimilarity data.
Specifying **s2d()** indicates that your proximity data are similarities.

Dissimilarity data should have zeros on the diagonal (that is, an
object is identical to itself) and nonnegative off-diagonal values.
Dissimilarities need not satisfy the triangular inequality, D(i,j)^2
__<__ D(i,h)^2 + D(h,j)^2. Similarity data should have ones on the
diagonal (that is, an object is identical to itself) and have
off-diagonal values between zero and one. In either case,
proximities should be symmetric. See option **force** if your data
violate these assumptions.

The available **s2d()** options, **standard** and **oneminus**, are defined as
follows:

**standard** dissim(ij) = sqrt{sim(ii)+sim(jj)-2sim(ij)} =
sqrt(2(1-sim(ij)))
**oneminus** dissim(ij) = 1-sim(ij)

**s2d(standard)** is the default.

**s2d()** should be specified only with measures in similarity form.

**force** corrects problems with the supplied proximity information. In the
long format used by **mdslong**, multiple measurements on (i,j) may be
available. Including both (i,j) and (j,i) would be treated as
multiple measurements. This is an error, even if the measures are
identical. Option **force** uses the mean of the measurements. **force**
also resolves problems on the diagonal, that is, comparisons of
objects with themselves; these should have zero dissimilarity or unit
similarity. **force** does not resolve incomplete data, that is, pairs
(i,j) for which no measurement is available. Out-of-range values are
also not fixed.

**dimension(***#***)** specifies the dimension of the approximating configuration.
The default is **dimension(2)**, and *#* should not exceed the number of
positive eigenvalues of the centered distance matrix.

**addconstant** specifies that if the double-centered distance matrix is not
positive semidefinite (psd), a constant should be added to the
squared distances to make it psd and, hence, Euclidean. This option
is allowed with classical MDS only.

+-----------+
----+ Reporting +--------------------------------------------------------

**neigen(***#***)** specifies the number of eigenvalues to be included in the
table. The default is **neigen(10)**. Specifying **neigen(0)** suppresses
the table. This option is allowed with classical MDS only.

**config** displays the table with the coordinates of the approximating
configuration. This table may also be displayed using the
postestimation command **estat config**; see **[MV] mds postestimation**.

**noplot** suppresses the graph of the approximating configuration. The
graph can still be produced later via **mdsconfig**, which also allows
the standard graphics options for fine-tuning the plot; see **[MV] mds**
**postestimation plots**.

+--------------+
----+ Minimization +-----------------------------------------------------

These options are available only with **method(modern)** or
**method(nonmetric)**:

**initialize(***initopt***)** specifies the initial values of the criterion
minimization process.

**initialize(classical)**, the default, uses the solution from classical
metric scaling as initial values. With **protect()**, all but the
first run start from random perturbations from the classical
solution. These random perturbations are independent and
normally distributed with standard error equal to the product of
**sdprotect(***#***)** and the standard deviation of the dissimilarities.
**initialize(classical)** is the default.

**initialize(random)** starts an optimization process from a random
starting configuration. These random configurations are
generated from independent normal distributions with standard
error equal to the product of **sdprotect(***#***)** and the standard
deviation of the dissimilarities. The means of the configuration
are irrelevant in MDS.

**initialize(from(***matname***)**[**, copy**]**)** sets the initial value to *matname*.
*matname* should be an *n* x *p* matrix, where *n* is the number of
observations and *p* is the number of dimensions, and the rows of
*matname* should be ordered with respect to **id()**. The rownames of
*matname* should be set correctly but will be ignored if **copy** is
specified. With **protect()**, the second-to-last runs start from
random perturbations from *matname*. These random perturbations
are independent normal distributed with standard error equal to
the product of **sdprotect(***#***)** and the standard deviation of the
dissimilarities.

**tolerance(***#***)** specifies the tolerance for the configuration matrix. When
the relative change in the configuration from one iteration to the
next is less than or equal to **tolerance()**, the **tolerance()**
convergence criterion is satisfied. The default is **tolerance(1e-4)**.

**ltolerance(***#***)** specifies the tolerance for the fit criterion. When the
relative change in the fit criterion from one iteration to the next
is less than or equal to **ltolerance()**, the **ltolerance()** convergence
is satisfied. The default is **ltolerance(1e-8)**.

Both the **tolerance()** and **ltolerance()** criteria must be satisfied for
convergence.

**iterate(***#***)** specifies the maximum number of iterations. The default is
**iterate(1000)**.

**protect(***#***)** requests that *#* optimizations be performed and that the best
of the solutions be reported. The default is **protect(1)**. See option
**initialize()** on starting values of the runs. The output contains a
table of the return code, the criterion value reached, and the seed
of the random number used to generate the starting value. Specifying
a large number, such as **protect(50)**, provides reasonable insight
whether the solution found is a global minimum and not just a local
minimum.

If any of the options **log**, **trace**, or **gradient** is also specified,
iteration reports will be printed for each optimization run. Beware:
this option will produce a lot of output.

**nolog** suppresses the iteration log, showing the progress of the
minimization process.

**trace** displays the configuration matrices in the iteration report.
Beware: this option may produce a lot of output.

**gradient** displays the gradient matrices of the fit criterion in the
iteration report. Beware: this option may produce a lot of output.

The following option is available with **mdslong** but is not shown in the
dialog box:

**sdprotect(***#***)** sets a proportionality constant for the standard deviations
of random configurations (**init(random)**) or random perturbations of
given starting configurations (**init(classical)** or **init(from())**). The
default is **sdprotect(1)**.

__Example__

A famous example in the MDS literature is the data on the percentage of
times that pairs of Morse code signals for two numbers (1,..,9,0) were
declared the same by 598 subjects. We use the Morse data in long format.
The entries are in the order 1,2,...,9,0.

**. webuse morse_long**

The proximity of (2,1) is entered, but not (1,2). Either one may be
entered; it does not matter which. Proximities between the same objects,
for example, (2,2) are not entered. First we generate a similarity
measure between the objects.

**. gen sim = freqsame/100**

Classical MDS

**. mdslong sim, id(digit1 digit2) s2d(standard)**

Modern MDS

**. mdslong sim, id(digit1 digit2) method(modern)**
(note: **loss(stress)** and **transform(identity)** are assumed)

Nonmetric MDS

**. mdslong sim, id(digit1 digit2) method(nonmetric)**
(note: **loss(stress)** and **transform(monotonic)** are assumed)

__Stored results__

**mdslong** stores the following in **e()**:

Scalars
**e(N)** number of underlying observations
**e(p)** number of dimensions in the approximating
configuration
**e(np)** number of strictly positive eigenvalues
**e(addcons)** constant added to squared dissimilarities to force
positive semidefiniteness
**e(mardia1)** Mardia measure 1
**e(mardia2)** Mardia measure 2
**e(critval)** loss criterion value
**e(npos)** number of pairs with positive weights
**e(wsum)** sum of weights
**e(alpha)** parameter of **transform(power)**
**e(ic)** iteration count
**e(rc)** return code
**e(converged)** **1** if converged, **0** otherwise

Macros
**e(cmd)** **mdslong**
**e(cmdline)** command as typed
**e(method)** **classical** or **modern** MDS method
**e(method2)** **nonmetric**, if **method(nonmetric)**
**e(loss)** loss criterion
**e(losstitle)** description loss criterion
**e(tfunction)** **identity**, **power**, or **monotonic**, transformation
function
**e(transftitle)** description of transformation
**e(id)** two ID variable names identifying compared object
pairs
**e(idtype)** **int** or **str**; type of **id()** variable
**e(duplicates)** **1** if duplicates in **id()**, **0** otherwise
**e(labels)** labels for ID categories
**e(mxlen)** maximum length of category labels
**e(depvar)** dependent variable containing dissimilarities
**e(dtype)** **similarity** or **dissimilarity**; type of proximity data
**e(s2d)** **standard** or **oneminus** (when **e(dtype)** is **similarity**)
**e(wtype)** weight type
**e(wexp)** weight expression
**e(unique)** **1** if eigenvalues are distinct, **0** otherwise
**e(init)** initialization method
**e(irngstate)** initial random-number state used for **init(random)**
**e(rngstate)** random-number state for solution
**e(norm)** normalization method
**e(targetmatrix)** name of target matrix for **normalize(target)**
**e(properties)** **nob noV** for modern or nonmetric MDS; **nob noV eigen**
for classical MDS
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(marginsnotok)** predictions disallowed by **margins**

Matrices
**e(D)** dissimilarity matrix
**e(Disparities)** disparity matrix for nonmetric MDS
**e(Y)** approximating configuration coordinates
**e(Ev)** eigenvalues
**e(W)** weight matrix
**e(idcoding)** coding for integer identifier variable
**e(norm_stats)** normalization statistics
**e(linearf)** two element vector defining the linear
transformation; distance equals first element
plus second element times dissimilarity

Functions
**e(sample)** marks estimation sample

__Reference__

Sammon, J. W., Jr. 1969. A nonlinear mapping for data structure
analysis. *IEEE Transactions on Computers* 18: 401-409.