# st: Mahalanobis Distances/Syntax/Stata

 From "Anthony Petrosino" To statalist@hsphsun2.harvard.edu Subject st: Mahalanobis Distances/Syntax/Stata Date Tue, 20 Feb 2007 13:40:27 -0500 (EST)

```Hello colleagues,

I'm new to Stata and to the list. We're doing a comparison group design
and using Mahalanobis Distances on two independent variables to do the
matching of units (schools).

A statistician wrote syntax for us in Stata but hasn't done the procedure
in a while and no longer remembered what each the steps means. I've tried
to decipher the syntax below.

Can anyone on the list let me know if we're on the right track here? For
one, I think that step #4 below might be redundant with step #3.

In appreciation,
Anthony

_____________

*Compute Mahalanobis Distance
>
>(1) "matrix drop _all" (this clears the dataset of all prior matrices so
that the matrix is create a new one based on the dataset)
>
>(2) "mkmat scix true, matrix(xvar)"-- creates a new matrix based on
SocioDemographic Composite Index (SCIX) and the true academic score and
calls that matrix "xvar"
>
>(3) "matrix accum cov = scix true, noc dev"
>
>"Matrix accum cov" forms a "cross-product matrix" (variables multiply
each other)
>
>The reason for adding "noc" (no constant) and "dev" (deviations) is the
following:
>
>*deviations, allowed only with matrix accum, causes the accumulation to
be performed in terms of deviations from the mean.  If noconstant is not
specified, the accumulation of X is done in terms of deviations, but the
added row and column of sums are not in deviation format (in
>which case they would be zeros).  With noconstant specified, the
>resulting matrix divided through by N-1, where N is the number of
observations, is a covariance matrix.*
>
>
>(4) "matrix cov = cov/(r(N)-1)"
>
>This creates a new matrix called "cov" that is the covariance divided by
the product of the correlation and N minus 1.
>
>
>(5)matrix factorx= (xvar) * (inv(cov)) * (xvar')
>
>This creates a new matrix called factorx that is the product of the
matrix "xvar" and the inverse of the matrix "cov" and the matrix xvar
>
>(6) matrix factor= (vecdiag(factorx))'
>
>This creates a new matrix called "factor" that returns the row vector
containing diagonal of square matrix for factorx inverted
>
>
>(7) svmat factor, names(factor)
>
>svmat stores product of matrix "factor" into new variable called "factor"
>
>(8) sort dstlcl factor
>
>Sort on these two variables (note that dstlcl is already in dataset for
the type of geographic region the school serves, i.e., rural, urban,
etc.).

--
Anthony Petrosino, Ph.D.
Senior Research Associate,
Learning Innovations at WestEd &
Associate Director of Research, Regional Educational Laboratory, Northeast
and Islands (REL-NEI)
200 Unicorn Park Drive, 4th floor
Woburn, MA 01824 USA
781-481-1117

--
Anthony Petrosino, Ph.D.
Senior Research Associate,
Learning Innovations at WestEd &
Associate Director of Research, Regional Educational Laboratory, Northeast
and Islands (REL-NEI)
200 Unicorn Park Drive, 4th floor
Woburn, MA 01824 USA
781-481-1117

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```