Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Outlier: Detection


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Outlier: Detection
Date   Wed, 20 Feb 2008 12:04:18 -0000

My own prejudice here is that a single (and to many simple) English word
"outliers" is fact a label for a class of quite different problems, with
quite different solutions. In a strong sense, everyone is right in this
debate. I agree with the strong scepticism expressed by Maarten and
others about procedures in this territory, many of which appear to be
oversold or overused. I also agree with Sergiy that some problems with
very large datasets make automated detection highly desirable, whatever
the difficulties. 

The original question was about two programs for Grubbs' test. The most
important warning here is probably to be sure to check out Grubbs'
original paper. The test codifies a highly specific model of what is
going on, one that may be a long way from what you can commit to. My
recollection is that the test is based on everything being normal
(Gaussian) except the outliers. If that is your model, you are in
principle much better off looking at a normal probability plot. If that
is not your model, or you are not sure what model you have of the (data
+ outliers) generating process, using Grubbs' test blindly is quite
likely a very bad idea. 

The program -grubbs- on SSC makes less use of temporary names than is
customary. There could be complications in operation or side-effects for
your dataset: their probability is small, but not zero. 

Nick 
n.j.cox@durham.ac.uk 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index