[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE getting rid of the outliners

From	"Michael Blasnik" <[email protected]>
To	<[email protected]>
Subject	Re: st: RE getting rid of the outliners
Date	Mon, 01 May 2006 08:35:41 -0400

Ronnie Babigumira <[email protected]> wrote:
Subject: Re: st: RE getting rid of the outliners
<snip> That said, I have a follow up question for you

Using the fences created by

local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))

Would capture "mild" outliers. So my question is, how does this sit with the discussion in for example Hamilton, Statistics with Stata, which distinguishes between mild and severe outliers pointing out that it is severe outliers that create problems for many statistical techniques.

I too have thought that the standard box plot fences flag too many values as outliers. Maybe it's because I often work with fairly large N, or because I work with messy real world data, but I find so many values outside the fences that the crietria has no meaning. Based on the standard defintion, you should expect about 22 "outliers" in a sample of 1,000 when the sample is perfectly Gaussian. In my experience, 5%-10% outliers are even more common with real data.

When I want to investigate outliers, in addition to using graphs and model diagnostics (e.g., df-betas), I often define "fences" at 3 iqr above and below the median. That threshold, which should result in 0.3 outliers per 1,000 Gaussian observations, tends to give me a more manageable list of "severe" outliers to investigate.

Michael Blasnik
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

References:
- st: RE getting rid of the outliners
  - From: "Maarten Buis" <[email protected]>
- Re: st: RE getting rid of the outliners
  - From: Ronnie Babigumira <[email protected]>

Prev by Date: RE: st: RE getting rid of the outliners
Next by Date: st: Predict in a system of equations setting
Previous by thread: Re: st: RE getting rid of the outliners
Next by thread: RE: st: RE getting rid of the outliners
Index(es):
- Date
- Thread