Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Detecting Outliers


From   "Raphael Fraser" <raphael.fraser@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Detecting Outliers
Date   Tue, 2 May 2006 09:12:32 -0500

Well (Nick not Ronnie) you presumed correct. This is panel and I
should have made
this absolutely clear. A graphical approach is indeed in order.

On 5/2/06, Raphael Fraser <raphael.fraser@gmail.com> wrote:
Well Ronnie you presumed correct. This is panel and I should have made
this absolutely clear. A graphical approach is indeed in order.


On 5/2/06, Ronnie Babigumira <rb.glists@gmail.com> wrote:
> Raphael, I totally missed the time dimension. Nick has given it more thought and has offered a better answer. Please
> ignore my "solution".
>
> Ronnie
>
> n j cox wrote:
> > The short answer is Yes, many of them.
> > A longer answer is more difficult to do well
> > given such little information.
> >
> > We have just had a thread on an overlapping
> > question. Look for "outliners" [sic] in
> > the archives.
> >
> > You don't quite say so, but these sound like
> > panel data. For concreteness, I guess 500
> > patients and 10 observations on each, one
> > for each year. My guesses have some
> > influence on my suggestions.
> >
> > What is an outlier in this context? Presumably
> > a patient who differs from many others; or
> > an observation that differs from the rest
> > of the patient's history. Both could make
> > sense, e.g. in the case of anorexic/bulimic
> > patients, or patients who had a really bad
> > year, say a fight with cancer or being
> > caught up in "Lost".
> >
> > First off, if a patient's height varies more than
> > trivially over 10 years, either there is something
> > going on, say growth for young people or some aging
> > effect, or there is a error in the data.
> >
> > Weight fluctuations would seem rather different
> > and everyone knows reasons for various kinds
> > of weight change even in adulthood. It would
> > seem a bit more difficult to pick up
> > on errors (meaning mistakes).
> >
> > There are lots of things you can do. You
> > could set up a loop to plot the time series
> > for each patient. For 500 patients that would
> > be a little tedious, but it is a direct
> > approach.
> >
> > You could try reductions, e.g.
> >
> > last height - first height
> > last weight - first weight
> > mean height over period
> > mean weight over period
> > some measure of variability of each
> >
> > and look for outliers on pairwise plots
> > of each. A scatterplot matrix often
> > shows errors even in data that have
> > supposedly been cleaned. Often
> > the cleaning is univariate, but a
> > weird data value can show up like
> > a run in fabric.
> >
> > My prejudice is that no testing or
> > measuring approach beats graphics
> > for finding outliers.
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> >
> > Raphael Fraser
> >
> > I have 10 years data (5000 observations) on patients heights and
> > weights. Is there any ado-file that could assist in locating possible
> > outliers?
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index