Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Detecting Outliers


From   "Raphael Fraser" <raphael.fraser@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Detecting Outliers
Date   Tue, 2 May 2006 09:10:35 -0500

Well Ronnie you presumed correct. This is panel and I should have made
this absolutely clear. A graphical approach is indeed in order.


On 5/2/06, Ronnie Babigumira <rb.glists@gmail.com> wrote:
Raphael, I totally missed the time dimension. Nick has given it more thought and has offered a better answer. Please
ignore my "solution".

Ronnie

n j cox wrote:
> The short answer is Yes, many of them.
> A longer answer is more difficult to do well
> given such little information.
>
> We have just had a thread on an overlapping
> question. Look for "outliners" [sic] in
> the archives.
>
> You don't quite say so, but these sound like
> panel data. For concreteness, I guess 500
> patients and 10 observations on each, one
> for each year. My guesses have some
> influence on my suggestions.
>
> What is an outlier in this context? Presumably
> a patient who differs from many others; or
> an observation that differs from the rest
> of the patient's history. Both could make
> sense, e.g. in the case of anorexic/bulimic
> patients, or patients who had a really bad
> year, say a fight with cancer or being
> caught up in "Lost".
>
> First off, if a patient's height varies more than
> trivially over 10 years, either there is something
> going on, say growth for young people or some aging
> effect, or there is a error in the data.
>
> Weight fluctuations would seem rather different
> and everyone knows reasons for various kinds
> of weight change even in adulthood. It would
> seem a bit more difficult to pick up
> on errors (meaning mistakes).
>
> There are lots of things you can do. You
> could set up a loop to plot the time series
> for each patient. For 500 patients that would
> be a little tedious, but it is a direct
> approach.
>
> You could try reductions, e.g.
>
> last height - first height
> last weight - first weight
> mean height over period
> mean weight over period
> some measure of variability of each
>
> and look for outliers on pairwise plots
> of each. A scatterplot matrix often
> shows errors even in data that have
> supposedly been cleaned. Often
> the cleaning is univariate, but a
> weird data value can show up like
> a run in fabric.
>
> My prejudice is that no testing or
> measuring approach beats graphics
> for finding outliers.
>
> Nick
> n.j.cox@durham.ac.uk
>
>
> Raphael Fraser
>
> I have 10 years data (5000 observations) on patients heights and
> weights. Is there any ado-file that could assist in locating possible
> outliers?
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index