Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to detect outliers


From   Xixi Lin <winnielxx@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to detect outliers
Date   Tue, 12 Feb 2013 13:22:27 -0500

Hi Steve,

About the robust regression, I have a question, after running mmreg,
is it possible to predict residuals? Mine has errors:

xi: mmregress Y X1 X2 X3
predict r,residual
error message: option residual not allowed

My question is that is it possible to test residual normality and
heterokedasticity after robust regression or does robust regression
already corrects for those?

Best,
Xixi Lin


On Mon, Feb 11, 2013 at 5:51 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
> Identifying outliers on the basis of a least squares fit is a very bad
> idea, however popular (Hampel et al., 1986). A far superior approach in
> Stata is the robust regression package -mmregress- by Verardi and Croux
> (-findit-). In providing a resistant fit, -mmregress- also identifies
> outliers and high leverage points.
>
>
>  Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata
> Journal 9, no. 3: 439-453.
>
> Hampel, Frank, Elvezio Ronchetti, Peter Rousseeuw, and Werner Stahel.
> 1986. Robust Statistics: The Approach Based on Influence Functions
> (Wiley Series in Probability and Mathematical Statistics). New York:
> John Wiley and Sons.
>
>
> Steve
>
> On Feb 11, 2013, at 2:37 PM, Xixi Lin wrote:
>
> Hi Nick,
>
> You are absolutely right! I messed up the obs numbers, it should be
> obs in each period instead. And After I fix that, the results from
> these two methods are pretty close.
>
> Thanks again. You are so helpful! ^_^
>
> Best,
> Xixi Lin
>
> On Mon, Feb 11, 2013 at 2:24 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>> I wouldn't regard any kind of large residual as indicating outliers
>> unequivocally. On the contrary, a really marked outlier is likely to
>> pull the regression towards it, with the result of a small residual.
>>
>> Your criterion here for Cook is 4/n, but evidently you are fitting
>> regressions separately for each period. The total dataset size of
>> 165779 is not pertinent to regressions fitted individually. The
>> relevant criterion is the number of observations used in each
>> regression.
>>
>> I think you'd learn more from residual vs fitted plots, even all 119 of them.
>>
>> Whether you would be better off with a different model depends on your
>> research problem.
>>
>> Nick
>>
>> On Mon, Feb 11, 2013 at 6:50 PM, Xixi Lin <winnielxx@gmail.com> wrote:
>>> Hi,
>>> I tried two ways to detect outliers: one is to regard Cook’s Distance
>>> greater than 4/n as outliers; the other is  to regard those with
>>> standardized residuals greater than 2 in magnitude as outliers. Here
>>> is the my code:
>>>
>>> gen residual=.
>>> tempvar temp
>>>   foreach z of numlist 2/120 {
>>>      capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
>>>      if !_rc {
>>>        predict temp,rstu
>>>        replace residual=temp if Period==`z'
>>>        drop temp
>>>      }
>>>   }
>>>
>>> //cook's distance
>>> gen di_bench=4/165979
>>> gen distance=.
>>> tempvar temp1
>>> foreach z of numlist 2/120 {
>>>      capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
>>>      if !_rc {
>>>        predict temp1,cook
>>>        replace distance=temp1 if Period==`z'
>>>        drop temp1
>>>      }
>>>   }
>>> //outlier numbers
>>> count if abs(residual) > 2    // 7922
>>> count if distance > di_bench     //111879
>>>
>>> My question is did I mess up the codes?  Why the two results are so
>>> different? one shows 7922 outliers, the other shows 111879 outliers.
>>> If I compare Cook's Distance with 1, then the outlier number is 133.
>>>
>>> Can anyone tells me which method I should choose? Or is there any
>>> other better ways to detect outliers? Thanks a lot.
>>>
>>> Best,
>>> Xixi Lin
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index