Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Detecting Outliers


From   Robert A Yaffee <bob.yaffee@nyu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Detecting Outliers
Date   Wed, 03 May 2006 11:20:26 -0400

The critical values will depend on the confidence intervals
which will in turn depend upon the sample size.
  You can set these critical values to 3 3.5 or 4 times the
standard errors.  Observations with values beyond these 
critical values can be deemed outliers.  Their leverage
should be examined to determine whether they are problematic
or not.
  They may be modeled or downweighted according to their 
theoretical importance.  How much autocorrelation persists
in the series will be a factor.  Additive outliers are generally
more problematic than innovational ones.
  Errors can be removed. Other additive outliers may be modeled by dummy
variables.  
So may outlier patches and periodic pulses.
  The modeling of the outliers will shrink the confidence intervals and
make other
borderline cases more salent.  Decition and identification should be
conducted
in as iterative procees. Incorporation will be done with different
levels of standard errors as the process proceeds.  When all relevant
outliers are modeled,
then a final run should be conducted with the same standard errors to
see which
ones should be retained.
   You can find these procedures developed in the 1986, 1988, 1993 works
of Tsay, Chen and Lui, and Balke on the subject of outliers in time
series processes.
   Regards,
      Bob Yaffee

     

Robert A. Yaffee, Ph.D.
Research Professor
Shirley M. Ehrenkranz
School of Social Work
New York University

home address:
Apt 19-W
2100 Linwood Ave.
Fort Lee, NJ
07024-3171
Phone: 201-242-3824
Fax: 201-242-3825
yaffee@nyu.edu

----- Original Message -----
From: Raphael Fraser <raphael.fraser@gmail.com>
Date: Wednesday, May 3, 2006 10:55 am
Subject: Re: st: Detecting Outliers

> One can safely assume the critical values will be the same within 
> each panel.
> 
> On 5/3/06, Robert A Yaffee <bob.yaffee@nyu.edu> wrote:
> > Will the critical values for outliers be the same within each panel
> > or will they differ from panel to panel (perhaps depending upon
> > the sample size of each panel)?
> >
> > Robert A. Yaffee, Ph.D.
> > Research Professor
> > Shirley M. Ehrenkranz
> > School of Social Work
> > New York University
> >
> > home address:
> > Apt 19-W
> > 2100 Linwood Ave.
> > Fort Lee, NJ
> > 07024-3171
> > Phone: 201-242-3824
> > Fax: 201-242-3825
> > yaffee@nyu.edu
> >
> > ----- Original Message -----
> > From: Raphael Fraser <raphael.fraser@gmail.com>
> > Date: Wednesday, May 3, 2006 9:14 am
> > Subject: Re: st: Detecting Outliers
> >
> > > The data I am refering to is panel data. The purpose of the 
> analysis> > is to detect possible errors. I have on average 50 
> observations on 100
> > > subjects.
> > >
> > > On 5/3/06, Robert A Yaffee <bob.yaffee@nyu.edu> wrote:
> > > > There are many types of outliers, depending upon
> > > > whether you have time series or panel data.
> > > >    In time series, there are additive outliers, innovational
> > > > outliers, outlier patches, for example.   Some have worse
> > > > effects than others.   Adjacent outliers may smear or
> > > > mask others.  They may have good or bad leverage.
> > > >    One should have the choice of detecting, modeling, or
> > > > replacing them depending upon their theoretical significance.
> > > >    What kind of analysis is being done here?
> > > >       RY
> > > >
> > > > Robert A. Yaffee, Ph.D.
> > > > Research Professor
> > > > Shirley M. Ehrenkranz
> > > > School of Social Work
> > > > New York University
> > > >
> > > > home address:
> > > > Apt 19-W
> > > > 2100 Linwood Ave.
> > > > Fort Lee, NJ
> > > > 07024-3171
> > > > Phone: 201-242-3824
> > > > Fax: 201-242-3825
> > > > yaffee@nyu.edu
> > > >
> > > > ----- Original Message -----
> > > > From: n j cox <n.j.cox@durham.ac.uk>
> > > > Date: Tuesday, May 2, 2006 9:37 am
> > > > Subject: Re: st: Detecting Outliers
> > > >
> > > > > The short answer is Yes, many of them.
> > > > > A longer answer is more difficult to do well
> > > > > given such little information.
> > > > >
> > > > > We have just had a thread on an overlapping
> > > > > question. Look for "outliners" [sic] in
> > > > > the archives.
> > > > >
> > > > > You don't quite say so, but these sound like
> > > > > panel data. For concreteness, I guess 500
> > > > > patients and 10 observations on each, one
> > > > > for each year. My guesses have some
> > > > > influence on my suggestions.
> > > > >
> > > > > What is an outlier in this context? Presumably
> > > > > a patient who differs from many others; or
> > > > > an observation that differs from the rest
> > > > > of the patient's history. Both could make
> > > > > sense, e.g. in the case of anorexic/bulimic
> > > > > patients, or patients who had a really bad
> > > > > year, say a fight with cancer or being
> > > > > caught up in "Lost".
> > > > >
> > > > > First off, if a patient's height varies more than
> > > > > trivially over 10 years, either there is something
> > > > > going on, say growth for young people or some aging
> > > > > effect, or there is a error in the data.
> > > > >
> > > > > Weight fluctuations would seem rather different
> > > > > and everyone knows reasons for various kinds
> > > > > of weight change even in adulthood. It would
> > > > > seem a bit more difficult to pick up
> > > > > on errors (meaning mistakes).
> > > > >
> > > > > There are lots of things you can do. You
> > > > > could set up a loop to plot the time series
> > > > > for each patient. For 500 patients that would
> > > > > be a little tedious, but it is a direct
> > > > > approach.
> > > > >
> > > > > You could try reductions, e.g.
> > > > >
> > > > > last height - first height
> > > > > last weight - first weight
> > > > > mean height over period
> > > > > mean weight over period
> > > > > some measure of variability of each
> > > > >
> > > > > and look for outliers on pairwise plots
> > > > > of each. A scatterplot matrix often
> > > > > shows errors even in data that have
> > > > > supposedly been cleaned. Often
> > > > > the cleaning is univariate, but a
> > > > > weird data value can show up like
> > > > > a run in fabric.
> > > > >
> > > > > My prejudice is that no testing or
> > > > > measuring approach beats graphics
> > > > > for finding outliers.
> > > > >
> > > > > Nick
> > > > > n.j.cox@durham.ac.uk
> > > > >
> > > > >
> > > > > Raphael Fraser
> > > > >
> > > > > I have 10 years data (5000 observations) on patients 
> heights and
> > > > > weights. Is there any ado-file that could assist in locating
> > > possible> > outliers?
> > > > > *
> > > > > *   For searches and help try:
> > > > > *   http://www.stata.com/support/faqs/res/findit.html
> > > > > *   http://www.stata.com/support/statalist/faq
> > > > > *   http://www.ats.ucla.edu/stat/stata/
> > > > >
> > > > *
> > > > *   For searches and help try:
> > > > *   http://www.stata.com/support/faqs/res/findit.html
> > > > *   http://www.stata.com/support/statalist/faq
> > > > *   http://www.ats.ucla.edu/stat/stata/
> > > >
> > >
> > > *
> > > *   For searches and help try:
> > > *   http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index