Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extreme data points


From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st: Extreme data points
Date   Wed, 8 Jun 2011 12:41:50 -0400

Jorge Eduardo Pérez Pérez <[email protected]>:
Thanks for those refs. I like the Mahalanobis transformation
illustrated for groupwise transformation at [MV] discrim lda (page
227), but applied to the whole dataset. Partial disclosure: I have
never identified outliers or excluded them using such a method, but
the graphical approach is most appealing since departures from
multivariate normality are likely to be immediately evident, and one
can label the supposed outliers as needed. All calculation is most
easily done in Mata, I think:

clear all
sysuse nlsw88
g w=ln(wage)
ren hours h
loc X w h
loc k: word count `X'
loc Z
foreach v of var `X' {
 g double m_`v'=.
 loc Z `Z' m_`v'
 }
mata:X=st_data(.,tokens(st_local("X")))
mata:eigensystem(variance(X),v=.,l=.)
mata:Z=(v'diag(l:^(-1/2))*v*(X:-mean(X))')'
mata:st_store(.,tokens(st_local("Z")),(Re(Z)))
loc z=invchi2(`k',.99)
loc e function sqrt(`z'-x^2), ra(-`z' `z')
loc e `e'recast(area) fc(gs12) lc(gs12) ||
loc e `e'function -sqrt(`z'-x^2), ra(-`z' `z')
loc e `e'recast(area) fc(gs12) lc(gs12)
tw `e'||sc `Z', leg(off) ms(o) ti(Outliers off disk)
g d=0
qui foreach v of var `Z' {
 replace d=d+(`v')^2
 }
la var d "Mahalanobis distance"
qchi d, df(`k') name(qqplot)


2011/6/8 Jorge Eduardo Pérez Pérez <[email protected]>:
> You might also want to take a look at multivariate outlier detection
> methods in Stata: -hadimvo- and -bacon-

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index