Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: outliers


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: outliers
Date   Fri, 27 Aug 2010 09:09:28 -0400

-
Fabio:


You are welcome. I would say, yes, that -mmregress- and -robreg- are
always better than OLS at detecting and dealing with outliers. If
these programs show no potential outliers, then you can go ahead and
use OLS if you like.   -dfits- after OLS will not detect nearby
outliers; they will mask one another. Also, there are always
questionable outliers; -mmregress- or -robreg- will smoothly
downweight these. With OLS diagnostics you must make a "keep/reject"
decision for each. I once had a student who attempted to identify and
eliminate outliers with OLS diagnostics; but after each round,  new
ones cropped up; she wound up eliminating nearly 20% of her
observations!   In fairness, she was using SAS, which had no robust
regression option then (late 1990's). This wouldn't have happened with
--mmregress- or -robreg- , and might not have happened if she'd had
Stata and -rreg-.  Note that neither package has routines to
downweight high leverage points (extreme X's)  that are not also
outliers in Y, but they will assist your in identifying them.  See the
article I referred to.

Steve

On Fri, Aug 27, 2010 at 7:43 AM, <fabio.zona@unibocconi.it> wrote:
> Dear Steve thank you very much!
> This new message is for you and all the statalist:
>
> To check for outliers, I run:
>
> predict df, dfits
>
> I discover that I have three observations which have df > [2 x sqroot(k/n)]
> (I did not count the number of df_values with negative values, because the statistic for df does not include absolute values).
>
> One of the three values is very large (i.e., 2,19, vs a treshold of 0.50). How would you consider this condtion? Do I have to drop this last observation? Would you run a mmregress?
> More broadly: when would you suggest to use mmregress instead of regress (also with robust option)? Can we say that mmregress is always better than the simple OLS? Or it can be used only in the presence of a large number of outliers? and for how many outliers would you suggest the mmregres instaead of regress?
>
> Thanks a lot!
>
>
>
>
> ----- Messaggio originale -----
> Da: "Steve Samuels" <sjsamuels@gmail.com>
> A: statalist@hsphsun2.harvard.edu
> Inviato: Lunedì, 23 agosto 2010 4:25:02 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna
> Oggetto: Re: st: outliers
>
> There are few rules about outliers, but the most important one is: OLS
> is the worst way to detect them. Detection requires a robust
> regression program; and a good program will not "reject" all outliers,
> but will automatically downweight them.  For covariates, one wants to
> identify not outliers per se, but those with high leverage.  But the
> decision about what to do with these is not automatic; sometimes they
> are the most important points and _must_ be kept.
>
> See: "Robust regression in Stata" by Vincenzo Verardi and Christophe
> Croux, The Stata Journal
> Volume 9 Number 3: pp. 439-453. Also available at:
> https://lirias.kuleuven.be/bitstream/123456789/202142/1/KBI_0823.pdf
>
> See also Verardi and Croux's contributed programs -mmregress- (findit)
> and Ben Jann's -robreg- (findit). These are superior to Stata's
> long-time built-in command -rreg-.
>
> Steve
>
> Steven Samuels
> sjsamuels@gmail.com
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>
> On Sun, Aug 22, 2010 at 4:04 PM, Fabio Zona <fabio.zona@unibocconi.it> wrote:
>
>> in a OLS model, can I limit the analysis on outliers related to the predictors only? Or do I have to check for eventual outliers also for control variables?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



On Fri, Aug 27, 2010 at 7:43 AM,  <fabio.zona@unibocconi.it> wrote:
> Dear Steve thank you very much!
> This new message is for you and all the statalist:
>
> To check for outliers, I run:
>
> predict df, dfits
>
> I discover that I have three observations which have df > [2 x sqroot(k/n)]
> (I did not count the number of df_values with negative values, because the statistic for df does not include absolute values).
>
> One of the three values is very large (i.e., 2,19, vs a treshold of 0.50). How would you consider this condtion? Do I have to drop this last observation? Would you run a mmregress?
> More broadly: when would you suggest to use mmregress instead of regress (also with robust option)? Can we say that mmregress is always better than the simple OLS? Or it can be used only in the presence of a large number of outliers? and for how many outliers would you suggest the mmregres instaead of regress?
>
> Thanks a lot!
>
>
>
>
> ----- Messaggio originale -----
> Da: "Steve Samuels" <sjsamuels@gmail.com>
> A: statalist@hsphsun2.harvard.edu
> Inviato: Lunedì, 23 agosto 2010 4:25:02 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna
> Oggetto: Re: st: outliers
>
> There are few rules about outliers, but the most important one is: OLS
> is the worst way to detect them. Detection requires a robust
> regression program; and a good program will not "reject" all outliers,
> but will automatically downweight them.  For covariates, one wants to
> identify not outliers per se, but those with high leverage.  But the
> decision about what to do with these is not automatic; sometimes they
> are the most important points and _must_ be kept.
>
> See: "Robust regression in Stata" by Vincenzo Verardi and Christophe
> Croux, The Stata Journal
> Volume 9 Number 3: pp. 439-453. Also available at:
> https://lirias.kuleuven.be/bitstream/123456789/202142/1/KBI_0823.pdf
>
> See also Verardi and Croux's contributed programs -mmregress- (findit)
> and Ben Jann's -robreg- (findit). These are superior to Stata's
> long-time built-in command -rreg-.
>
> Steve
>
> Steven Samuels
> sjsamuels@gmail.com
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>
> On Sun, Aug 22, 2010 at 4:04 PM, Fabio Zona <fabio.zona@unibocconi.it> wrote:
>
>> in a OLS model, can I limit the analysis on outliers related to the predictors only? Or do I have to check for eventual outliers also for control variables?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index