Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Outlier: Detection


From   "Sergiy Radyakin" <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Outlier: Detection
Date   Tue, 19 Feb 2008 17:17:42 -0500

On 2/19/08, Maarten buis <maartenbuis@yahoo.co.uk> wrote:
> --- badri.prasad@hrsdc-rhdsc.gc.ca wrote:
> > I am trying to run a stata program to detecet outlier in my data
> > set. I found 2 grubbs programs written in stata. Programs are here:
> >
> > Program # 1.
> > _______________________________program
> > begins_____________________________________________
> > **************************************
> > * This is grubbs.ado beta version
> > * Date: Jan, 20,2007
> > * Version: 1.1
> > *
> > * Questions, comments and bug reports :
> > * couderc@univ-paris1.fr
> <snip>
>
> An email adress is included in the code, so that seems the person best
> qualified to answer your question.

The question was: which of the two programs is better suited for
detecting outliers. I think it is safe to presume that each author
will favor his or her creation. It's similar to asking Stata Corp,
what is better Stata or SPSS :)

>
> In general I am very sceptical about automatic data dullifiers. There
> are two reasons why some observation is an outlier:
>
> 1) something went wrong during data collection / recording / recoding
> in which case you should be able to find it without any automated
> precedure. Often graphical methods are great ways of finding those.

Yes, I am also sceptical. But currently I am working with a "dataset",
where each (of many) "datafile" contains up to 4000 variables. Even
the simpliest question, like "How the missings were encoded?" is not
trivial anymore. Looking at 4000 graphs would be simply infeasible. On
the other hand, nobody insists on removing the suspicious
observations. Most of the times we just want to tag them, to review
more carefully. Is that OK?

>
> 2) an observation is truely exceptional, in which case it contains
> valueable information. Actually this information is much much more
> valueable than those dull observations somewhere in the middle. So, you
> definately do not want to ignore it.

Again, we don't want to ignore it. Sometimes we just want to ask
whether there is anything special about the data we are looking at?
Consider credit cards fraud. You probably do want your CC provider to
notify you about diamond ring purchases that happened at 3am ? though
it might be a common transaction for some exceptional econometricians
out there...

Best regards,
   Sergiy Radyakin

>
> Automated procedures would mix up 1) and 2), so it is much better to
> use your own knowledge and judgement.
>
> Hope this helps,
> Maarten
>
> -----------------------------------------
> Maarten L. Buis
> Department of Social Research Methodology
> Vrije Universiteit Amsterdam
> Boelelaan 1081
> 1081 HV Amsterdam
> The Netherlands
>
> visiting address:
> Buitenveldertselaan 3 (Metropolitan), room Z434
>
> +31 20 5986715
>
> http://home.fsw.vu.nl/m.buis/
> -----------------------------------------
>
>
>      __________________________________________________________
> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index