Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE getting rid of the outliners

From   Ronnie Babigumira <>
Subject   Re: st: RE getting rid of the outliners
Date   Mon, 01 May 2006 12:31:29 +0200

Thanks Maarten, my approach is close to yours, however in early stages of data cleaning, I like to look at severe outliers and crosscheck this with the raw data so as to be sure it is not a data entry problem. Post cleaning, I am really like you regarding how you deal with outliers.

That said, my main concern was that there will always be mild outliers and these can be ignored, however, if Vora would like to drop any outliers, I would think "severe" outliers are a better candidate. But that is for Vora to decide I suppose.

Thanks anyhow (I didnt know about the -adjacent- )


Maarten Buis wrote:


I had the same problem with sending to Vora instead of the
statalist (so Vora received multiple copies my email before I found out what the problem was, sorry about that)

In my not overly humble opinion, determining outliers this way
is nothing more than applying rules of thumb, and it is bad
practice to let your analysis be influenced by a blind application of a single rule of thumb. I am a regression man,
so when I am looking for outliers I look at scatter plots, various plots involving residuals, cook's distances, and leverages. I than try to identify points that worry me and try to find out why they are special. Than I decide what I am going to do about them, and in many cases the answer is nothing.

Maarten L. Buis
Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715

-----Original Message-----
From: []On Behalf Of Ronnie Babigumira
Sent: maandag 1 mei 2006 11:24
Subject: Re: st: RE getting rid of the outliners

Maarten, I had written in earlier suggesting -lv- (output below) or -iqr- (I just checked and for some reason, my
response went to Vora N and not to the list), however, your response is more true to the original posting.

That said, I have a follow up question for you

Using the fences created by

local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))

Would capture "mild" outliers. So my question is, how does this sit with the discussion in for example Hamilton,
Statistics with Stata, which distinguishes between mild and severe outliers pointing out that it is severe outliers that
create problems for many statistical techniques.

* For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index