# Re: st: Extreme data points

 From Austin Nichols To statalist@hsphsun2.harvard.edu Subject Re: st: Extreme data points Date Wed, 8 Jun 2011 12:41:50 -0400

```Jorge Eduardo Pérez Pérez <perez.jorge@ur.edu.co>:
Thanks for those refs. I like the Mahalanobis transformation
illustrated for groupwise transformation at [MV] discrim lda (page
227), but applied to the whole dataset. Partial disclosure: I have
never identified outliers or excluded them using such a method, but
the graphical approach is most appealing since departures from
multivariate normality are likely to be immediately evident, and one
can label the supposed outliers as needed. All calculation is most
easily done in Mata, I think:

clear all
sysuse nlsw88
g w=ln(wage)
ren hours h
loc X w h
loc k: word count `X'
loc Z
foreach v of var `X' {
g double m_`v'=.
loc Z `Z' m_`v'
}
mata:X=st_data(.,tokens(st_local("X")))
mata:eigensystem(variance(X),v=.,l=.)
mata:Z=(v'diag(l:^(-1/2))*v*(X:-mean(X))')'
mata:st_store(.,tokens(st_local("Z")),(Re(Z)))
loc z=invchi2(`k',.99)
loc e function sqrt(`z'-x^2), ra(-`z' `z')
loc e `e'recast(area) fc(gs12) lc(gs12) ||
loc e `e'function -sqrt(`z'-x^2), ra(-`z' `z')
loc e `e'recast(area) fc(gs12) lc(gs12)
tw `e'||sc `Z', leg(off) ms(o) ti(Outliers off disk)
g d=0
qui foreach v of var `Z' {
replace d=d+(`v')^2
}
la var d "Mahalanobis distance"
qchi d, df(`k') name(qqplot)

2011/6/8 Jorge Eduardo Pérez Pérez <perez.jorge@ur.edu.co>:
> You might also want to take a look at multivariate outlier detection
> methods in Stata: -hadimvo- and -bacon-

```