Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to detect outliers


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to detect outliers
Date   Mon, 11 Feb 2013 17:51:12 -0500

Identifying outliers on the basis of a least squares fit is a very bad
idea, however popular (Hampel et al., 1986). A far superior approach in
Stata is the robust regression package -mmregress- by Verardi and Croux
(-findit-). In providing a resistant fit, -mmregress- also identifies
outliers and high leverage points.


Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata
Journal 9, no. 3: 439-453.

Hampel, Frank, Elvezio Ronchetti, Peter Rousseeuw, and Werner Stahel.
1986. Robust Statistics: The Approach Based on Influence Functions
(Wiley Series in Probability and Mathematical Statistics). New York:
John Wiley and Sons.


Steve

On Feb 11, 2013, at 2:37 PM, Xixi Lin wrote:

Hi Nick,

You are absolutely right! I messed up the obs numbers, it should be
obs in each period instead. And After I fix that, the results from
these two methods are pretty close.

Thanks again. You are so helpful! ^_^

Best,
Xixi Lin

On Mon, Feb 11, 2013 at 2:24 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> I wouldn't regard any kind of large residual as indicating outliers
> unequivocally. On the contrary, a really marked outlier is likely to
> pull the regression towards it, with the result of a small residual.
> 
> Your criterion here for Cook is 4/n, but evidently you are fitting
> regressions separately for each period. The total dataset size of
> 165779 is not pertinent to regressions fitted individually. The
> relevant criterion is the number of observations used in each
> regression.
> 
> I think you'd learn more from residual vs fitted plots, even all 119 of them.
> 
> Whether you would be better off with a different model depends on your
> research problem.
> 
> Nick
> 
> On Mon, Feb 11, 2013 at 6:50 PM, Xixi Lin <winnielxx@gmail.com> wrote:
>> Hi,
>> I tried two ways to detect outliers: one is to regard Cook’s Distance
>> greater than 4/n as outliers; the other is  to regard those with
>> standardized residuals greater than 2 in magnitude as outliers. Here
>> is the my code:
>> 
>> gen residual=.
>> tempvar temp
>>   foreach z of numlist 2/120 {
>>      capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
>>      if !_rc {
>>        predict temp,rstu
>>        replace residual=temp if Period==`z'
>>        drop temp
>>      }
>>   }
>> 
>> //cook's distance
>> gen di_bench=4/165979
>> gen distance=.
>> tempvar temp1
>> foreach z of numlist 2/120 {
>>      capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
>>      if !_rc {
>>        predict temp1,cook
>>        replace distance=temp1 if Period==`z'
>>        drop temp1
>>      }
>>   }
>> //outlier numbers
>> count if abs(residual) > 2    // 7922
>> count if distance > di_bench     //111879
>> 
>> My question is did I mess up the codes?  Why the two results are so
>> different? one shows 7922 outliers, the other shows 111879 outliers.
>> If I compare Cook's Distance with 1, then the outlier number is 133.
>> 
>> Can anyone tells me which method I should choose? Or is there any
>> other better ways to detect outliers? Thanks a lot.
>> 
>> Best,
>> Xixi Lin
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index