Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Detecting Outliers


From   Ronnie Babigumira <rb.glists@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Detecting Outliers
Date   Tue, 02 May 2006 15:53:20 +0200

Raphael, I totally missed the time dimension. Nick has given it more thought and has offered a better answer. Please ignore my "solution".

Ronnie

n j cox wrote:

The short answer is Yes, many of them.
A longer answer is more difficult to do well
given such little information.

We have just had a thread on an overlapping
question. Look for "outliners" [sic] in
the archives.

You don't quite say so, but these sound like
panel data. For concreteness, I guess 500
patients and 10 observations on each, one
for each year. My guesses have some
influence on my suggestions.

What is an outlier in this context? Presumably
a patient who differs from many others; or
an observation that differs from the rest
of the patient's history. Both could make
sense, e.g. in the case of anorexic/bulimic
patients, or patients who had a really bad
year, say a fight with cancer or being
caught up in "Lost".

First off, if a patient's height varies more than
trivially over 10 years, either there is something
going on, say growth for young people or some aging
effect, or there is a error in the data.

Weight fluctuations would seem rather different
and everyone knows reasons for various kinds
of weight change even in adulthood. It would
seem a bit more difficult to pick up
on errors (meaning mistakes).

There are lots of things you can do. You
could set up a loop to plot the time series
for each patient. For 500 patients that would
be a little tedious, but it is a direct
approach.

You could try reductions, e.g.

last height - first height
last weight - first weight
mean height over period
mean weight over period
some measure of variability of each

and look for outliers on pairwise plots
of each. A scatterplot matrix often
shows errors even in data that have
supposedly been cleaned. Often
the cleaning is univariate, but a
weird data value can show up like
a run in fabric.

My prejudice is that no testing or
measuring approach beats graphics
for finding outliers.

Nick
n.j.cox@durham.ac.uk


Raphael Fraser

I have 10 years data (5000 observations) on patients heights and
weights. Is there any ado-file that could assist in locating possible
outliers?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index