[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
[email protected] (Jean Marie Linhart, StataCorp LP) |

To |
[email protected] |

Subject |
Re: st: Performing -mdslong- on a constant |

Date |
Tue, 29 Jan 2008 14:17:13 -0600 |

Zachary Neal <stata_at_uic at yahoo dot com> asked > This is a more general question about multidimensional > scaling, but I can't tell if the answer may have > something to do with the way stata executes the > command. > I am trying to figure out the meaning of the > configuration that is yielded by -mdslong- when > applied to a constant (i.e. all point-to-point > distances are equal). Intuitively, I would expect a > regular pattern that is independent of the number of > points and magnitude of the constant - perhaps a > circle, or a uniform distribution. But, the number of > points and magnitude of the constant seem to matter, > with each combination yielding a unique configuration. > Can anyone point me in the direction of a reference > that might explain what is happening here? Multidimensional scaling (MDS) is a data reduction/visualization technique. It allows you to visualize high-dimensional data in a more intuitive 2 (or more, but usually 2) dimensional Euclidean space where the distances between points approximate the dissimilarities in the original space. Zach is dealing with configurations that have original points all at equal dissimilarities to each other. In 2 dimensional Euclidean space, you can only do this with 3 points in an equilateral triangle. In 3 dimensional space, 4 points can be equidistant in a tetrahedron. You can add one point in 4-space and also get such a configuration with 5 points (a pentahedron?), and so on to 6 equidistant points in 5-space and beyond. If Zach is dealing with K>3 points at equal dissimilarity, his configuration cannot be exactly represented in K-1 or less dimensional space; in a lower-dimensional space the approximating configuration is going to have some distances larger and some distances smaller than the others. If you do classical MDS with 3 points, you get an equilateral triangle which is what you expect. The triangle can move around; the configuration is determined only up to rotation and reflection. Let's see this happen. I'm going to start from the morse_long data set. This is a 45 item dataset of morse codes for the digits 0-9; I will use the identifying information from the dataset and generate my own dissimilarity. To see the equilateral triangle I will restrict to the first 3 observations. I generate equal dissimilarity data. . webuse morse_long . gen dissim = .5 . mdslong dissim in 1/3, id(digit1 digit2) If you assume a different constant value for the dissimilarity, you get another equilateral triangle. Configurations are equivalent up to rotation and reflection. See: . generate dissim2 = 5 . mdslong dissim2 in 1/3, id(digit1 digit2) Now what happens if you use more points? In classical MDS, you are doing an eigen decomposition of a matrix then pulling 2 dimensions from it to display. (Details in the [MV] manual.) The dimensions correspond to the two largest eigenvalues. But if you have all dissimilarities equal then k-1 of the eigenvalues are the same, with the last eigenvalue either very small or equal to zero. Even worse, it turns out that the eigenvectors aren't completely determined (there are many equivalent sets). These are the indeterminate components in the configuration. Not only is the solution indeterminate, but each of the components explains an equal amount of the dissimilarity; the approximation is poor. If Zach tries . mdslong dissim, id(digit1 digit2) and . mdslong dissim2, id(digit1 digit2) He gets two unintuitive configurations; note that the two dimensions explain only 22% of the dissimilarity. Zach might prefer to turn to modern MDS in order to get configurations that are more intuitive. Modern MDS does not do an eigen decomposition, it calculates the minimum of a -loss()- function (with or without transformation, the -transform()- option). You've got several options for -loss()- and -transform()- which effect the configuration. Let's take a quick look at modern MDS, using all of the data: . mdslong dissim, id(digit1 digit2) loss(stress) init(random) protect(20) Now Zach is probably seeing a result that he likes. This configuration is a set of points placed regularly around a circle, plus a point at the center. The points still are not equidistant, but the reuslt seems intuitive. Options -loss(stress)-, -loss(nstress)- and -loss(sammon)- on this data give similar configurations. Because of the equivalent eigenvalues from classical MDS, the default initialization was problematic; I used init(random) instead. I use -protect(20)- to protect against convergence to a local, rather than a global, minimum. This makes multiple runs from different starting values and takes the smallest as the final answer. Options -loss(sstress)- and -loss(nsstress)- tend to give points on a circle, though not as regularly spaced. . mdslong dissim, id(digit1 digit2) loss(sstress) init(rand) protect(20) Modern MDS with options -loss(strain)- and -transform(identity)- is equivalent to classical MDS and gives the indeterminate behavior Zach observed before. I hope this offers some insight into what is going on with MDS. --Jean Marie [email protected] * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: re Stcox regression analysis** - Next by Date:
**Re: st: re Stcox regression analysis** - Previous by thread:
**st: Performing -mdslong- on a constant** - Next by thread:
**st: New version of felsdvreg.ado to estimate large number of two-wayfixed effects** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |