Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Decision on trimming the data


From   Ronán Conroy <rconroy@rcsi.ie>
To   "statalist hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Decision on trimming the data
Date   Wed, 23 Jun 2004 11:21:09 +0100

on 22/06/2004 14:03, Rijo John at rijo@igidr.ac.in wrote:

> I have a data set with quite a few outliers. Suppose I am trimming my
> dependent  variable 1% each from top and bottom using 1st and 99th
> percentiles. And I have the regression estimates before and after
> trimming. Let us also suppose that some of the variables that were
> significant before trimming turned out to be insignificant after trimming
> and/or viceversa.
> 
> Is there a standard way by which one can decide how much percentage
> of data should be trimmed? Is a chow test for the equality of coefficients
> enough for this? I mean trim upto the point where the changes in
> coefficients becomes insignificant? Or is there any other standard way to
> do this?

That's a tough one.

I tend not to trim observations. These extreme values are trying to tell you
something. Perhaps they are just saying that the method of measurement
breaks down from time to time, but they may be saying that there are
circumstances that give rise to atypical values. One dataset I worked with
had nutritional measurements and included a body builder and a woman with
anorexia. Both of these gave rise to strange values. So strategy one is to
try to explain why there are outliers.

Next move is to make sure that the influence of the outliers is not changing
the substantive conclusions of your analysis. For this, I tend to run -rreg-
in parallel with -regress-; the coefficients won't be the same, but where a
conclusion is different between the two, then it's a sign that the outliers
are driving the conclusion. I tend to regard -rreg- as closer to a
nonparametric method (yes, it estimates parameters; no, I don't understand
them) but it is useful because it can parallel a standard regression
analysis.

Another strategy might be to group the data and use -ologit- which should
also give you similar conclusions.



Ronan M Conroy (rconroy@rcsi.ie)
Lecturer in Biostatistics
Royal College of Surgeons
Dublin 2, Ireland
+353 1 402 2431 (fax 2764)

--------------------
Just say no to drug reps
http://www.nofreelunch.org/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index