Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Elimination of outliers |

Date |
Mon, 6 Jun 2011 21:59:00 +0100 |

Thanks for the clarification. On your last question, I think that usually makes no physical sense for environmental data where I have most experience. I am straining to imagine that it is anything other than horribly ad hoc in any application. On dummies for outliers: better than dropping them; good if there is some independent rationale. One definition of an outlier is that it surprises the analyst, and the best outcome is to think of a model in which the surprise disappears. Working on a logarithmic scale is so far as I can see the best trick, if not the oldest. (Thucydides recorded the use of the mode as a robust estimator, alhough not quite in those words, about 2400 years ago.) Nick On Mon, Jun 6, 2011 at 9:35 PM, Austin Nichols <austinnichols@gmail.com> wrote: > Nick-- > The simulation is contrived to illustrate one and only one point: > trimming data based on values of X that are suspect is fine, but > trimming data based on values of y that are suspect is dangerous at > best and nearly always ill-advised. This is a point I have made many > times on the list, sometimes in the context of replying to folks who > want to take the log of zero. Note I have made no mention of model > residuals; that is a different kind of outlier detection with its own > issues. The poster asked about trimming data based on the variables' > values alone, and my point was that this is not a bad idea a priori as > long as you only do it to RHS (explanatory) variables and not LHS > (outcome) variables. I think Jeff and Richard are thinking in terms > of model outliers, perhaps in terms of leverage or such. Your Amazon > example could fall in any of these categories, but including an Amazon > dummy is no different in practice from dropping the Amazon data point, > right? Or did you have in mind allowing for nonlinearities? It makes > sense in many cases to fit a best linear approximation to a subset of > the data and then to look at the outlying data with a less linear > model, no? > > On Mon, Jun 6, 2011 at 4:24 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> I don't think what happens in contrived simulations hits the main >> methodological issue at all. As a geographer, some of the time, an >> outlier to me is something like the Amazon which is big and different >> and something that needs to be accommodated in the model. That can be >> done in many ways other than by discarding outliers. Once throwing >> away awkward data is regarded as legitimate, when you do stop? >> (Independent evidence that an outlier is untrustworthy, as in lab >> records of experiments, is a different thing, although even there >> there are well-known stories of discarding as a matter of prior >> prejudice.) >> >> To make the question as stark as possible, and to suppress large areas >> of grey (gray): There are people who fit the data to the model and >> people who fit models to the data. It may sound like the same thing, >> but the attitude that one is so confident that the model is right that >> you are happy to discard the most inconvenient data is not at all the >> same as the attitude that the data can tell you something about the >> inadequacies of the current model. >> >> Nick * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Elimination of outliers***From:*Austin Nichols <austinnichols@gmail.com>

**References**:**st: Elimination of outliers***From:*"Achmed Aldai" <Hauptseminar@gmx.de>

**Re: st: Elimination of outliers***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Elimination of outliers***From:*"Achmed Aldai" <Hauptseminar@gmx.de>

**RE: st: Elimination of outliers***From:*Nick Cox <n.j.cox@durham.ac.uk>

**Re: st: Elimination of outliers***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Elimination of outliers***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Elimination of outliers***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Elimination of outliers***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Elimination of outliers***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: Elimination of outliers** - Next by Date:
**Re: st: Elimination of outliers** - Previous by thread:
**Re: st: Elimination of outliers** - Next by thread:
**Re: st: Elimination of outliers** - Index(es):