Will the critical values for outliers be the same within each panel
or will they differ from panel to panel (perhaps depending upon
the sample size of each panel)?
> The data I am refering to is panel data. The purpose of the analysis
> is to detect possible errors. I have on average 50 observations on 100
> subjects.
>
> > There are many types of outliers, depending upon
> > whether you have time series or panel data.
> > In time series, there are additive outliers, innovational
> > outliers, outlier patches, for example. Some have worse
> > effects than others. Adjacent outliers may smear or
> > mask others. They may have good or bad leverage.
> > One should have the choice of detecting, modeling, or
> > replacing them depending upon their theoretical significance.
> > What kind of analysis is being done here?
> > RY
> >
> >
> >
> > > The short answer is Yes, many of them.
> > > A longer answer is more difficult to do well
> > > given such little information.
> > >
> > > We have just had a thread on an overlapping
> > > question. Look for "outliners" [sic] in
> > > the archives.
> > >
> > > You don't quite say so, but these sound like
> > > panel data. For concreteness, I guess 500
> > > patients and 10 observations on each, one
> > > for each year. My guesses have some
> > > influence on my suggestions.
> > >
> > > What is an outlier in this context? Presumably
> > > a patient who differs from many others; or
> > > an observation that differs from the rest
> > > of the patient's history. Both could make
> > > sense, e.g. in the case of anorexic/bulimic
> > > patients, or patients who had a really bad
> > > year, say a fight with cancer or being
> > > caught up in "Lost".
> > >
> > > First off, if a patient's height varies more than
> > > trivially over 10 years, either there is something
> > > going on, say growth for young people or some aging
> > > effect, or there is a error in the data.
> > >
> > > Weight fluctuations would seem rather different
> > > and everyone knows reasons for various kinds
> > > of weight change even in adulthood. It would
> > > seem a bit more difficult to pick up
> > > on errors (meaning mistakes).
> > >
> > > There are lots of things you can do. You
> > > could set up a loop to plot the time series
> > > for each patient. For 500 patients that would
> > > be a little tedious, but it is a direct
> > > approach.
> > >
> > > You could try reductions, e.g.
> > >
> > > last height - first height
> > > last weight - first weight
> > > mean height over period
> > > mean weight over period
> > > some measure of variability of each
> > >
> > > and look for outliers on pairwise plots
> > > of each. A scatterplot matrix often
> > > shows errors even in data that have
> > > supposedly been cleaned. Often
> > > the cleaning is univariate, but a
> > > weird data value can show up like
> > > a run in fabric.
> > >
> > > My prejudice is that no testing or
> > > measuring approach beats graphics
> > > for finding outliers.
> > >
> > > Nick
> > > n.j.cox@durham.ac.uk
> > >
> > >
> > > Raphael Fraser
> > >
> > > I have 10 years data (5000 observations) on patients heights and
> > > weights. Is there any ado-file that could assist in locating
> possible> > outliers?
