Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: Re: RE: Re: RE: RE: IQR

From   Richard Goldstein <>
Subject   Re: st: RE: RE: Re: RE: Re: RE: RE: IQR
Date   Thu, 07 Jun 2007 15:29:13 -0400

I would add one point to Nick's laundry list -- an outlier
is a surprising result and it is often surprising because
we have used a particular model -- thinking about why
we obtained the surprise can sometimes lead to a different
model without any outliers.


Nick Cox wrote:
Sure, there is a -winsor- ado which I wrote on SSC and, according to Kit Baum's reports, it is quite heavily used. I have never used it myself, bar in development.
I cannot recall the details, but perhaps someone wrote into Statalist reporting that it seemed that
Stata did not support Winsorizing and that was a black mark against Stata. To which the best reply was a program, being concrete evidence that you can easily do Winsorizing in Stata and here is one way to do it.
But let us look at the wider picture. There is no one way to deal with outliers. There are many ways to deal with outliers, including
1. Going out "into the field" and doing the measurement again.
2. Testing whether they are genuine. Most of the
tests look pretty contrived to me, but you might find one
that you can believe fits your situation. Irrational faith that a test is appropriate is always needed
to apply a test that is then presented as quintessentially
3. Throwing them out as a matter of judgement, i.e. in Stata terms -drop-ping them from the data.
4. Throwing them out using some more-or-less automated (usually not "objective") rule.
5. Ignoring them, along the lines of either 3 or 4. This could be formal (e.g. trimming) or just leaving them in the dataset, but omitting them from analyses
as too hot to handle.
6. Pulling them in using some kind of adjustment, e.g. Winsorizing.
7. Downplaying them by using some other robust estimation
8. Downplaying them by working on a transformed scale.
9. Downplaying them by using a non-identical link function.
10. Accommodating them by fitting some appropriate
fat-, long-, or heavy-tailed distribution, without
or with predictors.
11. Sidestepping the issue by using some non-parametric
(e.g. rank-based) procedure.
12. Getting a handle on the implied uncertainty using bootstrapping, jackknifing or permutation-based
13. Editing to replace an outlier with some more
likely value, based on deterministic logic. "An 18-
year-old grandmother is unlikely, but the person in question was born in 1926, so presumably is
really 81."
14. Editing to replace an impossible or implausible outlier using some imputation method that is currently
acceptable not-quite-white magic.
15. Analysing with and without, and seeing how much difference the outlier(s) make(s), statistically, scientifically or practically.
16. Something Bayesian. My prior ignorance of quite
what forbids from giving any details.
Naturally, these categories intergrade in some cases, and I can believe I have forgotten
or am not aware of yet other approaches.
What is quite striking to me -- as with many any areas of statistical science -- is how much preferred solutions vary between investigator
and discipline, despite the broad similarity
of the problems that outliers pose.
*   For searches and help try:

© Copyright 1996–2020 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index