Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Elimination of outliers


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Elimination of outliers
Date   Mon, 6 Jun 2011 13:39:20 +0100

In general, a very bad idea. Consider transforming your response or predictors or using a non-identity link function in a generalized linear model or some flavour of robust regression as more measured tactics.

Nick

On 6 Jun 2011, at 12:46, "Achmed Aldai" <Hauptseminar@gmx.de> wrote:

Hi

I am currently working on a do file where I want to eliminate outliers which have the highest and the lowest values regarding certain variables. Here it is e.g. at and lt. In general I have 150000 observations and out of these observations I want to delete 25 observations from the upper and lower boundaries. But it might also be better to do it relatively meaning that I dont take the highest and lowest 25 but the lower and upper 1% of the corresponding variables.

gvkey           at           lt
1001            1120         231
1001            1230         312
1210            57           32
1210            67           25
1354            789          560
1368            650          500
1481            1230         900
2930            21           30
3201            234          213
3201            256          220
3210            267          320
4510            4335         3214

I hope this became clear.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index