Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to detect outliers |

Date |
Mon, 11 Feb 2013 19:24:40 +0000 |

I wouldn't regard any kind of large residual as indicating outliers unequivocally. On the contrary, a really marked outlier is likely to pull the regression towards it, with the result of a small residual. Your criterion here for Cook is 4/n, but evidently you are fitting regressions separately for each period. The total dataset size of 165779 is not pertinent to regressions fitted individually. The relevant criterion is the number of observations used in each regression. I think you'd learn more from residual vs fitted plots, even all 119 of them. Whether you would be better off with a different model depends on your research problem. Nick On Mon, Feb 11, 2013 at 6:50 PM, Xixi Lin <winnielxx@gmail.com> wrote: > Hi, > I tried two ways to detect outliers: one is to regard Cook’s Distance > greater than 4/n as outliers; the other is to regard those with > standardized residuals greater than 2 in magnitude as outliers. Here > is the my code: > > gen residual=. > tempvar temp > foreach z of numlist 2/120 { > capture reg Y X1 X2 X3 X4 if Period==`z', noconstant > if !_rc { > predict temp,rstu > replace residual=temp if Period==`z' > drop temp > } > } > > //cook's distance > gen di_bench=4/165979 > gen distance=. > tempvar temp1 > foreach z of numlist 2/120 { > capture reg Y X1 X2 X3 X4 if Period==`z', noconstant > if !_rc { > predict temp1,cook > replace distance=temp1 if Period==`z' > drop temp1 > } > } > //outlier numbers > count if abs(residual) > 2 // 7922 > count if distance > di_bench //111879 > > My question is did I mess up the codes? Why the two results are so > different? one shows 7922 outliers, the other shows 111879 outliers. > If I compare Cook's Distance with 1, then the outlier number is 133. > > Can anyone tells me which method I should choose? Or is there any > other better ways to detect outliers? Thanks a lot. > > Best, > Xixi Lin > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: How to detect outliers***From:*Xixi Lin <winnielxx@gmail.com>

**References**:**st: How to detect outliers***From:*Xixi Lin <winnielxx@gmail.com>

- Prev by Date:
**st: How to generate variables by period** - Next by Date:
**Re: st: How to generate variables by period** - Previous by thread:
**st: How to detect outliers** - Next by thread:
**Re: st: How to detect outliers** - Index(es):