Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: RE: st: Elimination of outliers |
Date | Mon, 6 Jun 2011 15:48:43 +0100 |
I said I was not going to do this, but Austin Nichols gave you a gun. Nick n.j.cox@durham.ac.uk Achmed Aldai Hi Nick, can you please tell me how to eliminate the top and bottom 2% of each variable because in my regression so far I am not getting the proper results and want to find out with this if this causes the problem. Thank you! -------- Original-Nachricht -------- > Datum: Mon, 6 Jun 2011 15:17:32 +0100 > Von: Nick Cox <n.j.cox@durham.ac.uk> > An: "\'statalist@hsphsun2.harvard.edu\'" <statalist@hsphsun2.harvard.edu> > Betreff: RE: st: Elimination of outliers > 1. Transformation means using a transformed scale (e.g. logarithms) for > one or more of your variables. > > 2. A non-identity link function in a generalized linear model means what > it says: the help for -glm- is the place to start and points to other > documentation. > > Otherwise, I assert that elimination of outliers is a very bad idea > _unless_ you know from independent evidence that they arise from serious and > irremediable problems of measurement, in which case chopping the tails of the > distribution is _not_ the way to do it. In most fields I know, the outliers > that stick out are genuine and important (the Amazon in hydrology, USA or > China wherever it is in economics, and so on, and so on) and leaving them > out is in my view lousy science and lousy statistics. > > If you disagree, well, we disagree, but I am not going to tell you how to > do this in Stata. > > Nick > n.j.cox@durham.ac.uk > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Achmed Aldai > Sent: 06 June 2011 15:07 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: Elimination of outliers > > Hi > > sorry I cannot really understand why it is a bad idea. I want to eliminate > the outliers beacuse I think they cause a bias in my results. > > How can I transform my predictors and what do you mean by that? > > What is a non-identity link function? > > Thank you > > FElix > -------- Original-Nachricht -------- > > Datum: Mon, 6 Jun 2011 13:39:20 +0100 > > Von: Nick Cox <njcoxstata@gmail.com> > > An: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> > > Betreff: Re: st: Elimination of outliers > > > In general, a very bad idea. Consider transforming your response or > > predictors or using a non-identity link function in a generalized > > linear model or some flavour of robust regression as more measured > > tactics. > > > > Nick > > > > On 6 Jun 2011, at 12:46, "Achmed Aldai" <Hauptseminar@gmx.de> wrote: > > > > > Hi > > > > > > I am currently working on a do file where I want to eliminate > > > outliers which have the highest and the lowest values regarding > > > certain variables. Here it is e.g. at and lt. In general I have > > > 150000 observations and out of these observations I want to delete > > > 25 observations from the upper and lower boundaries. But it might > > > also be better to do it relatively meaning that I dont take the > > > highest and lowest 25 but the lower and upper 1% of the > > > corresponding variables. > > > > > > gvkey at lt > > > 1001 1120 231 > > > 1001 1230 312 > > > 1210 57 32 > > > 1210 67 25 > > > 1354 789 560 > > > 1368 650 500 > > > 1481 1230 900 > > > 2930 21 30 > > > 3201 234 213 > > > 3201 256 220 > > > 3210 267 320 > > > 4510 4335 3214 > > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/