Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extreme data points

From	Austin Nichols <[email protected]>
To	[email protected]
Subject	Re: st: Extreme data points
Date	Wed, 8 Jun 2011 12:41:50 -0400

Jorge Eduardo Pérez Pérez <[email protected]>:
Thanks for those refs. I like the Mahalanobis transformation
illustrated for groupwise transformation at [MV] discrim lda (page
227), but applied to the whole dataset. Partial disclosure: I have
never identified outliers or excluded them using such a method, but
the graphical approach is most appealing since departures from
multivariate normality are likely to be immediately evident, and one
can label the supposed outliers as needed. All calculation is most
easily done in Mata, I think:

clear all
sysuse nlsw88
g w=ln(wage)
ren hours h
loc X w h
loc k: word count `X'
loc Z
foreach v of var `X' {
 g double m_`v'=.
 loc Z `Z' m_`v'
 }
mata:X=st_data(.,tokens(st_local("X")))
mata:eigensystem(variance(X),v=.,l=.)
mata:Z=(v'diag(l:^(-1/2))*v*(X:-mean(X))')'
mata:st_store(.,tokens(st_local("Z")),(Re(Z)))
loc z=invchi2(`k',.99)
loc e function sqrt(`z'-x^2), ra(-`z' `z')
loc e `e'recast(area) fc(gs12) lc(gs12) ||
loc e `e'function -sqrt(`z'-x^2), ra(-`z' `z')
loc e `e'recast(area) fc(gs12) lc(gs12)
tw `e'||sc `Z', leg(off) ms(o) ti(Outliers off disk)
g d=0
qui foreach v of var `Z' {
 replace d=d+(`v')^2
 }
la var d "Mahalanobis distance"
qchi d, df(`k') name(qqplot)


2011/6/8 Jorge Eduardo Pérez Pérez <[email protected]>:
> You might also want to take a look at multivariate outlier detection
> methods in Stata: -hadimvo- and -bacon-

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Extreme data points
  - From: "Achmed Aldai" <[email protected]>
- Re: st: Extreme data points
  - From: Austin Nichols <[email protected]>
- Re: st: Extreme data points
  - From: Austin Nichols <[email protected]>
- Re: st: Extreme data points
  - From: Jorge Eduardo Pérez Pérez <[email protected]>

Prev by Date: st: RE: Scatter and line graphs with by option
Next by Date: st: Regression splines with survey data
Previous by thread: Re: st: Extreme data points
Next by thread: st: Tests for cross sectional dependence when xttest2 doesn't work
Index(es):
- Date
- Thread