# Re: st: RE getting rid of the outliners

 From "Michael Blasnik" To Subject Re: st: RE getting rid of the outliners Date Mon, 01 May 2006 08:35:41 -0400

```Ronnie Babigumira <rb.glists@gmail.com> wrote:
Subject: Re: st: RE getting rid of the outliners
<snip> That said, I have a follow up question for you
```

Using the fences created by

local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))

Would capture "mild" outliers. So my question is, how does this sit with the discussion in for example Hamilton, Statistics with Stata, which distinguishes between mild and severe outliers pointing out that it is severe outliers that create problems for many statistical techniques.
I too have thought that the standard box plot fences flag too many values as outliers. Maybe it's because I often work with fairly large N, or because I work with messy real world data, but I find so many values outside the fences that the crietria has no meaning. Based on the standard defintion, you should expect about 22 "outliers" in a sample of 1,000 when the sample is perfectly Gaussian. In my experience, 5%-10% outliers are even more common with real data.

When I want to investigate outliers, in addition to using graphs and model diagnostics (e.g., df-betas), I often define "fences" at 3 iqr above and below the median. That threshold, which should result in 0.3 outliers per 1,000 Gaussian observations, tends to give me a more manageable list of "severe" outliers to investigate.

Michael Blasnik
michael.blasnik@verizon.net
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/