Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Xixi Lin <winnielxx@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to detect outliers |

Date |
Tue, 12 Feb 2013 13:22:27 -0500 |

Hi Steve, About the robust regression, I have a question, after running mmreg, is it possible to predict residuals? Mine has errors: xi: mmregress Y X1 X2 X3 predict r,residual error message: option residual not allowed My question is that is it possible to test residual normality and heterokedasticity after robust regression or does robust regression already corrects for those? Best, Xixi Lin On Mon, Feb 11, 2013 at 5:51 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > Identifying outliers on the basis of a least squares fit is a very bad > idea, however popular (Hampel et al., 1986). A far superior approach in > Stata is the robust regression package -mmregress- by Verardi and Croux > (-findit-). In providing a resistant fit, -mmregress- also identifies > outliers and high leverage points. > > > Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata > Journal 9, no. 3: 439-453. > > Hampel, Frank, Elvezio Ronchetti, Peter Rousseeuw, and Werner Stahel. > 1986. Robust Statistics: The Approach Based on Influence Functions > (Wiley Series in Probability and Mathematical Statistics). New York: > John Wiley and Sons. > > > Steve > > On Feb 11, 2013, at 2:37 PM, Xixi Lin wrote: > > Hi Nick, > > You are absolutely right! I messed up the obs numbers, it should be > obs in each period instead. And After I fix that, the results from > these two methods are pretty close. > > Thanks again. You are so helpful! ^_^ > > Best, > Xixi Lin > > On Mon, Feb 11, 2013 at 2:24 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> I wouldn't regard any kind of large residual as indicating outliers >> unequivocally. On the contrary, a really marked outlier is likely to >> pull the regression towards it, with the result of a small residual. >> >> Your criterion here for Cook is 4/n, but evidently you are fitting >> regressions separately for each period. The total dataset size of >> 165779 is not pertinent to regressions fitted individually. The >> relevant criterion is the number of observations used in each >> regression. >> >> I think you'd learn more from residual vs fitted plots, even all 119 of them. >> >> Whether you would be better off with a different model depends on your >> research problem. >> >> Nick >> >> On Mon, Feb 11, 2013 at 6:50 PM, Xixi Lin <winnielxx@gmail.com> wrote: >>> Hi, >>> I tried two ways to detect outliers: one is to regard Cook’s Distance >>> greater than 4/n as outliers; the other is to regard those with >>> standardized residuals greater than 2 in magnitude as outliers. Here >>> is the my code: >>> >>> gen residual=. >>> tempvar temp >>> foreach z of numlist 2/120 { >>> capture reg Y X1 X2 X3 X4 if Period==`z', noconstant >>> if !_rc { >>> predict temp,rstu >>> replace residual=temp if Period==`z' >>> drop temp >>> } >>> } >>> >>> //cook's distance >>> gen di_bench=4/165979 >>> gen distance=. >>> tempvar temp1 >>> foreach z of numlist 2/120 { >>> capture reg Y X1 X2 X3 X4 if Period==`z', noconstant >>> if !_rc { >>> predict temp1,cook >>> replace distance=temp1 if Period==`z' >>> drop temp1 >>> } >>> } >>> //outlier numbers >>> count if abs(residual) > 2 // 7922 >>> count if distance > di_bench //111879 >>> >>> My question is did I mess up the codes? Why the two results are so >>> different? one shows 7922 outliers, the other shows 111879 outliers. >>> If I compare Cook's Distance with 1, then the outlier number is 133. >>> >>> Can anyone tells me which method I should choose? Or is there any >>> other better ways to detect outliers? Thanks a lot. >>> >>> Best, >>> Xixi Lin >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: How to detect outliers***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: How to detect outliers***From:*Xixi Lin <winnielxx@gmail.com>

**Re: st: How to detect outliers***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: How to detect outliers***From:*Xixi Lin <winnielxx@gmail.com>

**Re: st: How to detect outliers***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: st: GLLAMM versus XTMEPOISSON** - Next by Date:
**Re: st: GLLAMM versus XTMEPOISSON** - Previous by thread:
**Re: st: How to detect outliers** - Next by thread:
**Re: st: How to detect outliers** - Index(es):