Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to detect outliers |

Date |
Mon, 11 Feb 2013 17:51:12 -0500 |

Identifying outliers on the basis of a least squares fit is a very bad idea, however popular (Hampel et al., 1986). A far superior approach in Stata is the robust regression package -mmregress- by Verardi and Croux (-findit-). In providing a resistant fit, -mmregress- also identifies outliers and high leverage points. Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata Journal 9, no. 3: 439-453. Hampel, Frank, Elvezio Ronchetti, Peter Rousseeuw, and Werner Stahel. 1986. Robust Statistics: The Approach Based on Influence Functions (Wiley Series in Probability and Mathematical Statistics). New York: John Wiley and Sons. Steve On Feb 11, 2013, at 2:37 PM, Xixi Lin wrote: Hi Nick, You are absolutely right! I messed up the obs numbers, it should be obs in each period instead. And After I fix that, the results from these two methods are pretty close. Thanks again. You are so helpful! ^_^ Best, Xixi Lin On Mon, Feb 11, 2013 at 2:24 PM, Nick Cox <njcoxstata@gmail.com> wrote: > I wouldn't regard any kind of large residual as indicating outliers > unequivocally. On the contrary, a really marked outlier is likely to > pull the regression towards it, with the result of a small residual. > > Your criterion here for Cook is 4/n, but evidently you are fitting > regressions separately for each period. The total dataset size of > 165779 is not pertinent to regressions fitted individually. The > relevant criterion is the number of observations used in each > regression. > > I think you'd learn more from residual vs fitted plots, even all 119 of them. > > Whether you would be better off with a different model depends on your > research problem. > > Nick > > On Mon, Feb 11, 2013 at 6:50 PM, Xixi Lin <winnielxx@gmail.com> wrote: >> Hi, >> I tried two ways to detect outliers: one is to regard Cook’s Distance >> greater than 4/n as outliers; the other is to regard those with >> standardized residuals greater than 2 in magnitude as outliers. Here >> is the my code: >> >> gen residual=. >> tempvar temp >> foreach z of numlist 2/120 { >> capture reg Y X1 X2 X3 X4 if Period==`z', noconstant >> if !_rc { >> predict temp,rstu >> replace residual=temp if Period==`z' >> drop temp >> } >> } >> >> //cook's distance >> gen di_bench=4/165979 >> gen distance=. >> tempvar temp1 >> foreach z of numlist 2/120 { >> capture reg Y X1 X2 X3 X4 if Period==`z', noconstant >> if !_rc { >> predict temp1,cook >> replace distance=temp1 if Period==`z' >> drop temp1 >> } >> } >> //outlier numbers >> count if abs(residual) > 2 // 7922 >> count if distance > di_bench //111879 >> >> My question is did I mess up the codes? Why the two results are so >> different? one shows 7922 outliers, the other shows 111879 outliers. >> If I compare Cook's Distance with 1, then the outlier number is 133. >> >> Can anyone tells me which method I should choose? Or is there any >> other better ways to detect outliers? Thanks a lot. >> >> Best, >> Xixi Lin >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: How to detect outliers***From:*Xixi Lin <winnielxx@gmail.com>

**References**:**st: How to detect outliers***From:*Xixi Lin <winnielxx@gmail.com>

**Re: st: How to detect outliers***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: How to detect outliers***From:*Xixi Lin <winnielxx@gmail.com>

- Prev by Date:
**st: trend in Survey Analysis** - Next by Date:
**Re: st: Problem with xtile-command from the egenmore package** - Previous by thread:
**Re: st: How to detect outliers** - Next by thread:
**Re: st: How to detect outliers** - Index(es):