Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Detecting Outliers

From   n j cox <>
Subject   Re: st: Detecting Outliers
Date   Tue, 02 May 2006 14:37:09 +0100

The short answer is Yes, many of them.
A longer answer is more difficult to do well
given such little information.

We have just had a thread on an overlapping
question. Look for "outliners" [sic] in
the archives.

You don't quite say so, but these sound like
panel data. For concreteness, I guess 500
patients and 10 observations on each, one
for each year. My guesses have some
influence on my suggestions.

What is an outlier in this context? Presumably
a patient who differs from many others; or
an observation that differs from the rest
of the patient's history. Both could make
sense, e.g. in the case of anorexic/bulimic
patients, or patients who had a really bad
year, say a fight with cancer or being
caught up in "Lost".

First off, if a patient's height varies more than
trivially over 10 years, either there is something
going on, say growth for young people or some aging
effect, or there is a error in the data.

Weight fluctuations would seem rather different
and everyone knows reasons for various kinds
of weight change even in adulthood. It would
seem a bit more difficult to pick up
on errors (meaning mistakes).

There are lots of things you can do. You
could set up a loop to plot the time series
for each patient. For 500 patients that would
be a little tedious, but it is a direct

You could try reductions, e.g.

last height - first height
last weight - first weight
mean height over period
mean weight over period
some measure of variability of each

and look for outliers on pairwise plots
of each. A scatterplot matrix often
shows errors even in data that have
supposedly been cleaned. Often
the cleaning is univariate, but a
weird data value can show up like
a run in fabric.

My prejudice is that no testing or
measuring approach beats graphics
for finding outliers.


Raphael Fraser

I have 10 years data (5000 observations) on patients heights and
weights. Is there any ado-file that could assist in locating possible
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index