Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: RE: st: Elimination of outliers


From   "Jeff" <jbw-appraiser@earthlink.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: RE: st: Elimination of outliers
Date   Mon, 6 Jun 2011 08:42:57 -0700

"Outliers and influential observations should not routinely be deleted
or automatically down-weighted because they are not necessarily bad
observations.  On the contrary, if they are correct, they may be the
most informative points in the data.  For example, they may indicate
that the data did not come from a normal population or that the model is
not linear."  "Regression Analysis By Example", 3rd Edition, Chatterjee,
Hadi, Price.

Jeffrey B. Wolpin




-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Achmed Aldai
Sent: Monday, June 06, 2011 7:38 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: RE: st: Elimination of outliers

Hi Nick,

can you please tell me how to eliminate the top and bottom 2% of each
variable because in my regression so far I am not getting the proper
results and want to find out with this if this causes the problem.

Thank you!
-------- Original-Nachricht --------
> Datum: Mon, 6 Jun 2011 15:17:32 +0100
> Von: Nick Cox <n.j.cox@durham.ac.uk>
> An: "\'statalist@hsphsun2.harvard.edu\'"
<statalist@hsphsun2.harvard.edu>
> Betreff: RE: st: Elimination of outliers

> 1. Transformation means using a transformed scale (e.g. logarithms)
for
> one or more of your variables. 
> 
> 2. A non-identity link function in a generalized linear model means
what
> it says: the help for -glm- is the place to start and points to other
> documentation. 
> 
> Otherwise, I assert that elimination of outliers is a very bad idea
> _unless_ you know from independent evidence that they arise from
serious and
> irremediable problems of measurement, in which case chopping the tails
of the
> distribution is _not_ the way to do it. In most fields I know, the
outliers
> that stick out are genuine and important (the Amazon in hydrology, USA
or
> China wherever it is in economics, and so on, and so on) and leaving
them
> out is in my view lousy science and lousy statistics. 
> 
> If you disagree, well, we disagree, but I am not going to tell you how
to
> do this in Stata. 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Achmed
Aldai
> Sent: 06 June 2011 15:07
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Elimination of outliers
> 
> Hi
> 
> sorry I cannot really understand why it is a bad idea. I want to
eliminate
> the outliers beacuse I think they cause a bias in my results. 
> 
> How can I transform my predictors and what do you mean by that?
> 
> What is a non-identity link function?
> 
> Thank you
> 
> FElix
> -------- Original-Nachricht --------
> > Datum: Mon, 6 Jun 2011 13:39:20 +0100
> > Von: Nick Cox <njcoxstata@gmail.com>
> > An: "statalist@hsphsun2.harvard.edu"
<statalist@hsphsun2.harvard.edu>
> > Betreff: Re: st: Elimination of outliers
> 
> > In general, a very bad idea. Consider transforming your response or

> > predictors or using a non-identity link function in a generalized  
> > linear model or some flavour of robust regression as more measured  
> > tactics.
> > 
> > Nick
> > 
> > On 6 Jun 2011, at 12:46, "Achmed Aldai" <Hauptseminar@gmx.de> wrote:
> > 
> > > Hi
> > >
> > > I am currently working on a do file where I want to eliminate  
> > > outliers which have the highest and the lowest values regarding  
> > > certain variables. Here it is e.g. at and lt. In general I have  
> > > 150000 observations and out of these observations I want to delete

> > > 25 observations from the upper and lower boundaries. But it might

> > > also be better to do it relatively meaning that I dont take the  
> > > highest and lowest 25 but the lower and upper 1% of the  
> > > corresponding variables.
> > >
> > > gvkey           at           lt
> > > 1001            1120         231
> > > 1001            1230         312
> > > 1210            57           32
> > > 1210            67           25
> > > 1354            789          560
> > > 1368            650          500
> > > 1481            1230         900
> > > 2930            21           30
> > > 3201            234          213
> > > 3201            256          220
> > > 3210            267          320
> > > 4510            4335         3214
> > >
> > > I hope this became clear.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
NEU: FreePhone - kostenlos mobil telefonieren!			
Jetzt informieren: http://www.gmx.net/de/go/freephone
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index