[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Detecting Outliers
There are many types of outliers, depending upon
whether you have time series or panel data.
In time series, there are additive outliers, innovational
outliers, outlier patches, for example. Some have worse
effects than others. Adjacent outliers may smear or
mask others. They may have good or bad leverage.
One should have the choice of detecting, modeling, or
replacing them depending upon their theoretical significance.
What kind of analysis is being done here?
Robert A. Yaffee, Ph.D.
Shirley M. Ehrenkranz
School of Social Work
New York University
2100 Linwood Ave.
Fort Lee, NJ
----- Original Message -----
From: n j cox <firstname.lastname@example.org>
Date: Tuesday, May 2, 2006 9:37 am
Subject: Re: st: Detecting Outliers
> The short answer is Yes, many of them.
> A longer answer is more difficult to do well
> given such little information.
> We have just had a thread on an overlapping
> question. Look for "outliners" [sic] in
> the archives.
> You don't quite say so, but these sound like
> panel data. For concreteness, I guess 500
> patients and 10 observations on each, one
> for each year. My guesses have some
> influence on my suggestions.
> What is an outlier in this context? Presumably
> a patient who differs from many others; or
> an observation that differs from the rest
> of the patient's history. Both could make
> sense, e.g. in the case of anorexic/bulimic
> patients, or patients who had a really bad
> year, say a fight with cancer or being
> caught up in "Lost".
> First off, if a patient's height varies more than
> trivially over 10 years, either there is something
> going on, say growth for young people or some aging
> effect, or there is a error in the data.
> Weight fluctuations would seem rather different
> and everyone knows reasons for various kinds
> of weight change even in adulthood. It would
> seem a bit more difficult to pick up
> on errors (meaning mistakes).
> There are lots of things you can do. You
> could set up a loop to plot the time series
> for each patient. For 500 patients that would
> be a little tedious, but it is a direct
> You could try reductions, e.g.
> last height - first height
> last weight - first weight
> mean height over period
> mean weight over period
> some measure of variability of each
> and look for outliers on pairwise plots
> of each. A scatterplot matrix often
> shows errors even in data that have
> supposedly been cleaned. Often
> the cleaning is univariate, but a
> weird data value can show up like
> a run in fabric.
> My prejudice is that no testing or
> measuring approach beats graphics
> for finding outliers.
> Raphael Fraser
> I have 10 years data (5000 observations) on patients heights and
> weights. Is there any ado-file that could assist in locating possible
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: