Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Outlier: Detection


From   Robert A Yaffee <bob.yaffee@nyu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Outlier: Detection
Date   Wed, 20 Feb 2008 03:57:49 -0500

Martin, Sergiy,
     Martin is correct that outliers can either reflect important impacts of events or 
typos.  Nonetheless, they are important to identify, so you can check the history
and the data.  Intervention analysis is based on this kind of thing--properly
identifying, and distinguishing different kinds of outliers from one another,
as well as differentiating them from level shifts, ramp effects, and tempoary changes.
Outliers will increase the aggregate standard error and bias
your significances downward if not controlled for properly.  Noisy data may contain
a lot of outliers.  One needs to ascertain the influence of these outliers on the
parameter estimation and significance testing.
     Daniel Pena (Outliers, Influential observations and Missing Data in A First
Course on Time Series, 2001)writes that there are several types of outliers to be concerned about.
as have Tsay(1988) and others before him.  Additive outliers tend to be more pernicious
than innovational outliers.  Outliers may be extended into an extended pulse or 
into an outlier patch (Pena, op cit,159), which if at the end of a series may be confused with a level
shift (Balke,N, Detecting level shifts in time series, JBES,1993).  When there are multiple outliers, some 
outliers may mask (Pena, op cit,154)
 or smear normal values if the outliers are on either side of the normal value. Dave Reilly and others
have written on periodic pulses and how they will bias the forecasting if not properly controlled 
for at the end of a series. 
     Missing data will bias your ACF and PACF, so if your outliers represent typos, you should identify
them and impute the missing data by a reasonable means if at all possible.
     Chen and Liu (Joint estimation of model parameters and outlier effects in time series, JASA,1993) have noted that 
that as you model the outliers, more may appear as the standard errors are reduced.  This is an iterative process.   It
should not be done in one pass.  You may want to do a final simultaneous test with the same standard error to see
whether any lose their significance and can be dropped out.
      - Regards,
                Bob Yaffee


     

  When data are somewhat noisy,  perhaps there are other processes
that are causing the innovations.    
    

----- Original Message -----
From: Sergiy Radyakin <serjradyakin@gmail.com>
Date: Tuesday, February 19, 2008 5:20 pm
Subject: Re: st: RE: Outlier: Detection
To: statalist@hsphsun2.harvard.edu


> On 2/19/08, Maarten buis <maartenbuis@yahoo.co.uk> wrote:
> > --- badri.prasad@hrsdc-rhdsc.gc.ca wrote:
> > > I am trying to run a stata program to detecet outlier in my data
> > > set. I found 2 grubbs programs written in stata. Programs are here:
> > >
> > > Program # 1.
> > > _______________________________program
> > > begins_____________________________________________
> > > **************************************
> > > * This is grubbs.ado beta version
> > > * Date: Jan, 20,2007
> > > * Version: 1.1
> > > *
> > > * Questions, comments and bug reports :
> > > * couderc@univ-paris1.fr
> > <snip>
> >
> > An email adress is included in the code, so that seems the person best
> > qualified to answer your question.
> 
> The question was: which of the two programs is better suited for
> detecting outliers. I think it is safe to presume that each author
> will favor his or her creation. It's similar to asking Stata Corp,
> what is better Stata or SPSS :)
> 
> >
> > In general I am very sceptical about automatic data dullifiers. There
> > are two reasons why some observation is an outlier:
> >
> > 1) something went wrong during data collection / recording / recoding
> > in which case you should be able to find it without any automated
> > precedure. Often graphical methods are great ways of finding those.
> 
> Yes, I am also sceptical. But currently I am working with a "dataset",
> where each (of many) "datafile" contains up to 4000 variables. Even
> the simpliest question, like "How the missings were encoded?" is not
> trivial anymore. Looking at 4000 graphs would be simply infeasible. On
> the other hand, nobody insists on removing the suspicious
> observations. Most of the times we just want to tag them, to review
> more carefully. Is that OK?
> 
> >
> > 2) an observation is truely exceptional, in which case it contains
> > valueable information. Actually this information is much much more
> > valueable than those dull observations somewhere in the middle. So, 
> you
> > definately do not want to ignore it.
> 
> Again, we don't want to ignore it. Sometimes we just want to ask
> whether there is anything special about the data we are looking at?
> Consider credit cards fraud. You probably do want your CC provider to
> notify you about diamond ring purchases that happened at 3am ? though
> it might be a common transaction for some exceptional econometricians
> out there...
> 
> Best regards,
>    Sergiy Radyakin
> 
> >
> > Automated procedures would mix up 1) and 2), so it is much better to
> > use your own knowledge and judgement.
> >
> > Hope this helps,
> > Maarten
> >
> > -----------------------------------------
> > Maarten L. Buis
> > Department of Social Research Methodology
> > Vrije Universiteit Amsterdam
> > Boelelaan 1081
> > 1081 HV Amsterdam
> > The Netherlands
> >
> > visiting address:
> > Buitenveldertselaan 3 (Metropolitan), room Z434
> >
> > +31 20 5986715
> >
> > http://home.fsw.vu.nl/m.buis/
> > -----------------------------------------
> >
> >
> >      __________________________________________________________
> > Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index