Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How to find extreme values

From   "Nakelse, Tebila (AfricaRice)" <>
To   "" <>
Subject   RE: st: How to find extreme values
Date   Tue, 27 Mar 2012 08:19:37 +0000

Hi Sandy Y. Zhu,
Find below an example of identification and correction of extreme value  .

*** correction of  the variable price

sysuse auto, clear

/* plot to visualize the extreme*/ 
graph box price 

/* we can distinguish 8 extremes values*/

***  quartiles of  price

egen Q1_price= pctile(price), p(25)
egen Q3_price= pctile(price), p(75)
egen  IC_price= iqr(price)

***Identification of extreme value 

gen touse=1 if (price< Q1_price-1.25*IC_price| price> Q3_price+1.25*IC_price) & missing(price)==0
recode touse . =0

tab touse

***Correction of the price 
gen pricec =price
replace pricec  =Q1_price-1.25*IC_price if price < Q1_price-1.25*IC_price & touse==1
replace pricec =Q3_price+1.25*IC_price if price> Q3_price+1.25*IC_price &  touse==1

/* the corrected price box plot to see if the extreme value remain*/
*graph  box pricec

Hope this help,


-----Original Message-----
From: [] On Behalf Of Maarten Buis
Sent: Tuesday, March 27, 2012 8:27 AM
Subject: Re: st: How to find extreme values

On Tue, Mar 27, 2012 at 5:24 AM, Barth Riley <> wrote:
> To remove outliers, you could:
> preserve
> replace var = . if abs(var) >= 1000000 (or some other value) [perform 
> analyses] restore
> preserve and restore are added if you want to make a temporary change 
> to these values

If I were to exclude such observations I would probably do something like:

gen byte touse = abs(var) <= 1e6
reg y var x if touse

-reg- could be any command, the key is the -if touse- part. The variable touse will contain 0s and 1s such that those non-extreme values get 1 (true) and the extreme values get 0 (false), see:
<>. The reason why I prefer this is that it does not destroy any information in my dataset.

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index