Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE getting rid of the outliners


From   Ronnie Babigumira <rb.glists@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE getting rid of the outliners
Date   Mon, 01 May 2006 11:24:04 +0200

Maarten, I had written in earlier suggesting -lv- (output below) or -iqr- (I just checked and for some reason, my response went to Vora N and not to the list), however, your response is more true to the original posting.

That said, I have a follow up question for you

Using the fences created by

local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))

Would capture "mild" outliers. So my question is, how does this sit with the discussion in for example Hamilton, Statistics with Stata, which distinguishes between mild and severe outliers pointing out that it is severe outliers that create problems for many statistical techniques.

Many thanks

Ronnie

. lv mpg

# 74 Mileage (mpg)
---------------------------------
M 37.5 | 20 | spread pseudosigma
F 19 | 18 21.5 25 | 7 5.216359
E 10 | 15 21.5 28 | 13 5.771728
D 5.5 | 14 22.25 30.5 | 16.5 5.576303
C 3 | 14 24.5 35 | 21 5.831039
B 2 | 12 23.5 35 | 23 5.732448
A 1.5 | 12 25 38 | 26 6.040635
1 | 12 26.5 41 | 29 6.16562
| |
| | # below # above
inner fence | 7.5 35.5 | 0 1
outer fence | -3 46 | 0 0



Maarten Buis wrote:

-findit adjacent value- brings up the Nick's module
-adjacent- which you can install. It will only show
you the adjacent values, it does not store them so
you can use them to drop outliers. That could be an
oversight on the part of Nick, but I would not be
surprised if it was deliberate to prevent people
from mechanically dropping outliers.

Underneath I show how to create a new variable that
is one when mpg is an outliner and zero when it is
not, and how that variable could be used without
dropping cases. For details have a look at:
http://www.stata.com/support/faqs/data/trueorfalse.html


*----------------begin example-----------------
sysuse auto, clear
sum mpg, detail
local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))
gen out = mpg<`l' | mpg>`u'
hist mpg          /*histogram including outlier*/
hist mpg if !out  /*historgram excluding outlier*/
*---------------end example---------------------

HTH,
Maarten

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

-----Original Message-----
From: vora n [mailto:vora_stata@hotmail.com]
Sent: zondag 30 april 2006 2:47
To: statalist@hsphsun2.harvard.edu
Subject: st: getting rid of the outliners

Is there any STATA command that can drop
the observations that are the outliners?

Let's say I graph the box-and-whisker plot

graph box y

and then the graph will show the outliners.
Is there any built-in command that can identify
these outliners and drop them out of my data?

Or is there any command that tells the upper
adjacent value and the lower adjacent value
so that I can drop the outliners manually?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index