Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: How to detect outliers


From   Xixi Lin <winnielxx@gmail.com>
To   statalist <statalist@hsphsun2.harvard.edu>
Subject   st: How to detect outliers
Date   Mon, 11 Feb 2013 13:50:34 -0500

Hi,
I tried two ways to detect outliers: one is to regard Cook’s Distance
greater than 4/n as outliers; the other is  to regard those with
standardized residuals greater than 2 in magnitude as outliers. Here
is the my code:

gen residual=.
tempvar temp
   foreach z of numlist 2/120 {
      capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
      if !_rc {
        predict temp,rstu
        replace residual=temp if Period==`z'
        drop temp
      }
   }

//cook's distance
gen di_bench=4/165979
gen distance=.
tempvar temp1
foreach z of numlist 2/120 {
      capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
      if !_rc {
        predict temp1,cook
        replace distance=temp1 if Period==`z'
        drop temp1
      }
   }
//outlier numbers
count if abs(residual) > 2    // 7922
count if distance > di_bench     //111879

My question is did I mess up the codes?  Why the two results are so
different? one shows 7922 outliers, the other shows 111879 outliers.
If I compare Cook's Distance with 1, then the outlier number is 133.

Can anyone tells me which method I should choose? Or is there any
other better ways to detect outliers? Thanks a lot.

Best,
Xixi Lin

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index