Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: outliers |

Date |
Fri, 27 Aug 2010 09:09:28 -0400 |

- Fabio: You are welcome. I would say, yes, that -mmregress- and -robreg- are always better than OLS at detecting and dealing with outliers. If these programs show no potential outliers, then you can go ahead and use OLS if you like. -dfits- after OLS will not detect nearby outliers; they will mask one another. Also, there are always questionable outliers; -mmregress- or -robreg- will smoothly downweight these. With OLS diagnostics you must make a "keep/reject" decision for each. I once had a student who attempted to identify and eliminate outliers with OLS diagnostics; but after each round, new ones cropped up; she wound up eliminating nearly 20% of her observations! In fairness, she was using SAS, which had no robust regression option then (late 1990's). This wouldn't have happened with --mmregress- or -robreg- , and might not have happened if she'd had Stata and -rreg-. Note that neither package has routines to downweight high leverage points (extreme X's) that are not also outliers in Y, but they will assist your in identifying them. See the article I referred to. Steve On Fri, Aug 27, 2010 at 7:43 AM, <fabio.zona@unibocconi.it> wrote: > Dear Steve thank you very much! > This new message is for you and all the statalist: > > To check for outliers, I run: > > predict df, dfits > > I discover that I have three observations which have df > [2 x sqroot(k/n)] > (I did not count the number of df_values with negative values, because the statistic for df does not include absolute values). > > One of the three values is very large (i.e., 2,19, vs a treshold of 0.50). How would you consider this condtion? Do I have to drop this last observation? Would you run a mmregress? > More broadly: when would you suggest to use mmregress instead of regress (also with robust option)? Can we say that mmregress is always better than the simple OLS? Or it can be used only in the presence of a large number of outliers? and for how many outliers would you suggest the mmregres instaead of regress? > > Thanks a lot! > > > > > ----- Messaggio originale ----- > Da: "Steve Samuels" <sjsamuels@gmail.com> > A: statalist@hsphsun2.harvard.edu > Inviato: Lunedì, 23 agosto 2010 4:25:02 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna > Oggetto: Re: st: outliers > > There are few rules about outliers, but the most important one is: OLS > is the worst way to detect them. Detection requires a robust > regression program; and a good program will not "reject" all outliers, > but will automatically downweight them. For covariates, one wants to > identify not outliers per se, but those with high leverage. But the > decision about what to do with these is not automatic; sometimes they > are the most important points and _must_ be kept. > > See: "Robust regression in Stata" by Vincenzo Verardi and Christophe > Croux, The Stata Journal > Volume 9 Number 3: pp. 439-453. Also available at: > https://lirias.kuleuven.be/bitstream/123456789/202142/1/KBI_0823.pdf > > See also Verardi and Croux's contributed programs -mmregress- (findit) > and Ben Jann's -robreg- (findit). These are superior to Stata's > long-time built-in command -rreg-. > > Steve > > Steven Samuels > sjsamuels@gmail.com > 18 Cantine's Island > Saugerties NY 12477 > USA > Voice: 845-246-0774 > Fax: 206-202-4783 > > On Sun, Aug 22, 2010 at 4:04 PM, Fabio Zona <fabio.zona@unibocconi.it> wrote: > >> in a OLS model, can I limit the analysis on outliers related to the predictors only? Or do I have to check for eventual outliers also for control variables? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > On Fri, Aug 27, 2010 at 7:43 AM, <fabio.zona@unibocconi.it> wrote: > Dear Steve thank you very much! > This new message is for you and all the statalist: > > To check for outliers, I run: > > predict df, dfits > > I discover that I have three observations which have df > [2 x sqroot(k/n)] > (I did not count the number of df_values with negative values, because the statistic for df does not include absolute values). > > One of the three values is very large (i.e., 2,19, vs a treshold of 0.50). How would you consider this condtion? Do I have to drop this last observation? Would you run a mmregress? > More broadly: when would you suggest to use mmregress instead of regress (also with robust option)? Can we say that mmregress is always better than the simple OLS? Or it can be used only in the presence of a large number of outliers? and for how many outliers would you suggest the mmregres instaead of regress? > > Thanks a lot! > > > > > ----- Messaggio originale ----- > Da: "Steve Samuels" <sjsamuels@gmail.com> > A: statalist@hsphsun2.harvard.edu > Inviato: Lunedì, 23 agosto 2010 4:25:02 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna > Oggetto: Re: st: outliers > > There are few rules about outliers, but the most important one is: OLS > is the worst way to detect them. Detection requires a robust > regression program; and a good program will not "reject" all outliers, > but will automatically downweight them. For covariates, one wants to > identify not outliers per se, but those with high leverage. But the > decision about what to do with these is not automatic; sometimes they > are the most important points and _must_ be kept. > > See: "Robust regression in Stata" by Vincenzo Verardi and Christophe > Croux, The Stata Journal > Volume 9 Number 3: pp. 439-453. Also available at: > https://lirias.kuleuven.be/bitstream/123456789/202142/1/KBI_0823.pdf > > See also Verardi and Croux's contributed programs -mmregress- (findit) > and Ben Jann's -robreg- (findit). These are superior to Stata's > long-time built-in command -rreg-. > > Steve > > Steven Samuels > sjsamuels@gmail.com > 18 Cantine's Island > Saugerties NY 12477 > USA > Voice: 845-246-0774 > Fax: 206-202-4783 > > On Sun, Aug 22, 2010 at 4:04 PM, Fabio Zona <fabio.zona@unibocconi.it> wrote: > >> in a OLS model, can I limit the analysis on outliers related to the predictors only? Or do I have to check for eventual outliers also for control variables? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: outliers***From:*fabio.zona@unibocconi.it

- Prev by Date:
**Re: st: outliers** - Next by Date:
**Re: st: outliers** - Previous by thread:
**Re: st: outliers** - Next by thread:
**Re: st: outliers** - Index(es):