Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Xixi Lin <winnielxx@gmail.com> |
To | statalist <statalist@hsphsun2.harvard.edu> |
Subject | st: How to detect outliers |
Date | Mon, 11 Feb 2013 13:50:34 -0500 |
Hi, I tried two ways to detect outliers: one is to regard Cook’s Distance greater than 4/n as outliers; the other is to regard those with standardized residuals greater than 2 in magnitude as outliers. Here is the my code: gen residual=. tempvar temp foreach z of numlist 2/120 { capture reg Y X1 X2 X3 X4 if Period==`z', noconstant if !_rc { predict temp,rstu replace residual=temp if Period==`z' drop temp } } //cook's distance gen di_bench=4/165979 gen distance=. tempvar temp1 foreach z of numlist 2/120 { capture reg Y X1 X2 X3 X4 if Period==`z', noconstant if !_rc { predict temp1,cook replace distance=temp1 if Period==`z' drop temp1 } } //outlier numbers count if abs(residual) > 2 // 7922 count if distance > di_bench //111879 My question is did I mess up the codes? Why the two results are so different? one shows 7922 outliers, the other shows 111879 outliers. If I compare Cook's Distance with 1, then the outlier number is 133. Can anyone tells me which method I should choose? Or is there any other better ways to detect outliers? Thanks a lot. Best, Xixi Lin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/