[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Outlier: Detection

From	"Sergiy Radyakin" <[email protected]>
To	[email protected]
Subject	Re: st: RE: Outlier: Detection
Date	Tue, 19 Feb 2008 17:17:42 -0500

On 2/19/08, Maarten buis <[email protected]> wrote:
> --- [email protected] wrote:
> > I am trying to run a stata program to detecet outlier in my data
> > set. I found 2 grubbs programs written in stata. Programs are here:
> >
> > Program # 1.
> > _______________________________program
> > begins_____________________________________________
> > **************************************
> > * This is grubbs.ado beta version
> > * Date: Jan, 20,2007
> > * Version: 1.1
> > *
> > * Questions, comments and bug reports :
> > * [email protected]
> <snip>
>
> An email adress is included in the code, so that seems the person best
> qualified to answer your question.

The question was: which of the two programs is better suited for
detecting outliers. I think it is safe to presume that each author
will favor his or her creation. It's similar to asking Stata Corp,
what is better Stata or SPSS :)

>
> In general I am very sceptical about automatic data dullifiers. There
> are two reasons why some observation is an outlier:
>
> 1) something went wrong during data collection / recording / recoding
> in which case you should be able to find it without any automated
> precedure. Often graphical methods are great ways of finding those.

Yes, I am also sceptical. But currently I am working with a "dataset",
where each (of many) "datafile" contains up to 4000 variables. Even
the simpliest question, like "How the missings were encoded?" is not
trivial anymore. Looking at 4000 graphs would be simply infeasible. On
the other hand, nobody insists on removing the suspicious
observations. Most of the times we just want to tag them, to review
more carefully. Is that OK?

>
> 2) an observation is truely exceptional, in which case it contains
> valueable information. Actually this information is much much more
> valueable than those dull observations somewhere in the middle. So, you
> definately do not want to ignore it.

Again, we don't want to ignore it. Sometimes we just want to ask
whether there is anything special about the data we are looking at?
Consider credit cards fraud. You probably do want your CC provider to
notify you about diamond ring purchases that happened at 3am ? though
it might be a common transaction for some exceptional econometricians
out there...

Best regards,
   Sergiy Radyakin

>
> Automated procedures would mix up 1) and 2), so it is much better to
> use your own knowledge and judgement.
>
> Hope this helps,
> Maarten
>
> -----------------------------------------
> Maarten L. Buis
> Department of Social Research Methodology
> Vrije Universiteit Amsterdam
> Boelelaan 1081
> 1081 HV Amsterdam
> The Netherlands
>
> visiting address:
> Buitenveldertselaan 3 (Metropolitan), room Z434
>
> +31 20 5986715
>
> http://home.fsw.vu.nl/m.buis/
> -----------------------------------------
>
>
>      __________________________________________________________
> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: Outlier: Detection
  - From: Robert A Yaffee <[email protected]>
- Re: st: RE: Outlier: Detection
  - From: Maarten buis <[email protected]>

References:
- st: RE: Outlier: Detection
  - From: <[email protected]>
- Re: st: RE: Outlier: Detection
  - From: Maarten buis <[email protected]>

Prev by Date: Re: st: Appending files using stat transfer
Next by Date: RE: st: Multiple mean comparisons with complex survey design - Richard
Previous by thread: Re: st: RE: Outlier: Detection
Next by thread: Re: st: RE: Outlier: Detection
Index(es):
- Date
- Thread