Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Elimination of outliers
From 
 
Nick Cox <[email protected]> 
To 
 
"[email protected]" <[email protected]> 
Subject 
 
Re: st: Elimination of outliers 
Date 
 
Mon, 6 Jun 2011 13:39:20 +0100 
In general, a very bad idea. Consider transforming your response or  
predictors or using a non-identity link function in a generalized  
linear model or some flavour of robust regression as more measured  
tactics.
Nick
On 6 Jun 2011, at 12:46, "Achmed Aldai" <[email protected]> wrote:
Hi
I am currently working on a do file where I want to eliminate  
outliers which have the highest and the lowest values regarding  
certain variables. Here it is e.g. at and lt. In general I have  
150000 observations and out of these observations I want to delete  
25 observations from the upper and lower boundaries. But it might  
also be better to do it relatively meaning that I dont take the  
highest and lowest 25 but the lower and upper 1% of the  
corresponding variables.
gvkey           at           lt
1001            1120         231
1001            1230         312
1210            57           32
1210            67           25
1354            789          560
1368            650          500
1481            1230         900
2930            21           30
3201            234          213
3201            256          220
3210            267          320
4510            4335         3214
I hope this became clear.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/