Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Treatment of outliers

From	"Allan Reese (Cefas)" <[email protected]>
To	<[email protected]>
Subject	st: Treatment of outliers
Date	Tue, 7 Jun 2011 11:00:06 +0100

The exchanges prompted by a request to *trim* variables (technically
distinct from identifying and removing outliers) prompt me to post a
comment I bottled up at the time Peter Diggle's paper was read at the
RSS.  As it's geostatistics, Nick may have a view.

http://www.math.ntnu.no/~hrue/r-inla.org/case-studies/Diggle09/DiggleSep
t09.pdf (an odd ref, but one that google found and it works today) has
the title "Geostatistical inference under preferential sampling".  Since
the premise was that data were collected with prejudice, and the point
of the data and the modelling was to identify locations with high Pb
contamination, it seemed to me very odd that the paper includes a
throwaway comment "The measured lead concentrations included two gross
outliers in 2000, each of which we replaced by the average of the
remaining values from that year's survey."

In principle, I agree with Nick (gosh, that's a phrase gone out of
fashion) that outliers in real data need very careful consideration.
One of the major problems in the use of statistical methods is that
people apply textbook methods without noting the assumptions underlying
the data generation. (So, doctor, can we assume all your patients are
independent, identical and exchangeable from a single normal
distribution?)

A simple test of the robustness of a model is to compare the fit
with/without the use of suspected outliers.  If the fit is substantially
the same, you can use the results.  If including the outliers
substantially changes the model, you are forced to make a judgment
(non-probabilistic) on the source of the data.

I also note the original posting mentioned, "I have 150000 observations
and out of these observations I want to delete 25 observations from the
upper and lower boundaries."  

Allan 

R Allan Reese
Senior statistician, Cefas
The Nothe, Weymouth DT4 8UB 
Tel: +44 (0)1305 20 6614 -direct
Fax: +44 (0)1305 20 6601 
www.cefas.defra.gov.uk 




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: How to test whether coefficient change between two models significant
Next by Date: Re: st: How to test whether coefficient change between two models significant
Previous by thread: st: How to test whether coefficient change between two models significant
Next by thread: st: Extreme data points
Index(es):
- Date
- Thread