[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Ronán Conroy <rconroy@rcsi.ie> |

To |
"statalist hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Decision on trimming the data |

Date |
Wed, 23 Jun 2004 11:21:09 +0100 |

on 22/06/2004 14:03, Rijo John at rijo@igidr.ac.in wrote: > I have a data set with quite a few outliers. Suppose I am trimming my > dependent variable 1% each from top and bottom using 1st and 99th > percentiles. And I have the regression estimates before and after > trimming. Let us also suppose that some of the variables that were > significant before trimming turned out to be insignificant after trimming > and/or viceversa. > > Is there a standard way by which one can decide how much percentage > of data should be trimmed? Is a chow test for the equality of coefficients > enough for this? I mean trim upto the point where the changes in > coefficients becomes insignificant? Or is there any other standard way to > do this? That's a tough one. I tend not to trim observations. These extreme values are trying to tell you something. Perhaps they are just saying that the method of measurement breaks down from time to time, but they may be saying that there are circumstances that give rise to atypical values. One dataset I worked with had nutritional measurements and included a body builder and a woman with anorexia. Both of these gave rise to strange values. So strategy one is to try to explain why there are outliers. Next move is to make sure that the influence of the outliers is not changing the substantive conclusions of your analysis. For this, I tend to run -rreg- in parallel with -regress-; the coefficients won't be the same, but where a conclusion is different between the two, then it's a sign that the outliers are driving the conclusion. I tend to regard -rreg- as closer to a nonparametric method (yes, it estimates parameters; no, I don't understand them) but it is useful because it can parallel a standard regression analysis. Another strategy might be to group the data and use -ologit- which should also give you similar conclusions. Ronan M Conroy (rconroy@rcsi.ie) Lecturer in Biostatistics Royal College of Surgeons Dublin 2, Ireland +353 1 402 2431 (fax 2764) -------------------- Just say no to drug reps http://www.nofreelunch.org/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Decision on trimming the data***From:*Rijo John <rijo@igidr.ac.in>

- Prev by Date:
**[no subject]** - Next by Date:
**st: RE: Varying survival distributions & interval censoring (2)** - Previous by thread:
**st: Decision on trimming the data** - Next by thread:
**st: serrbar Command** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |