Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: How to delete studentized residuals with absolute values greater than or equal to two after conducting areg procedure?


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Re: How to delete studentized residuals with absolute values greater than or equal to two after conducting areg procedure?
Date   Thu, 27 Jun 2013 17:34:01 -0400

I highly recommend the very robust mmregress package, by Verardi and
Croux (net describe st0173_1,(http://www.stata-journal.com/software/sj10-2))
as the best, indeed, the only way in Stata to reliably identify outliers
and high leverage points and to simultaneously fit models that
down-weight or eliminate the influence of such points. Neither -qreg-
nor -rreg- can downweight or identify high leverage points. 

Note that diagnostics based on OLS, including studentized residuals, are
very sensitive to outliers. They consider changes related to the
deletion of one observation at a time. Extreme points pull the fitted
regression surface towards themselves. If there are two
outlying/high-leverage observations in the same location, each will
"mask" the other. -mmregress- is not subject to such masking.

For a well-written introduction to these topics, look at Hampel et al. (1986)

References:

Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata
Journal 9, no. 3: 439-453. 


Hampel, Frank, Elvezio Ronchetti, Peter Rousseeuw, and Werner Stahel. 1986. Robust Statistics: The Approach Based on Influence Functions (Wiley Series in Probability and Mathematical Statistics). New York: John Wiley and Sons.

Steve
[email protected]


 
On Jun 27, 2013, at 10:36 AM, George_Huang wrote:

Dear David,

Your explanation helps a lot. Do you mean that I should pay attention not only on residual but also on leverage to identify the potentially unusual or influential observations? If so, Cook's D, DFITS, lvr2plot may be the better commands for us to detect “outliers”. Right?

You are right. I have panel data from 2006 to 2011, so my coauthor wishes that I can run the regressions including firm or industry fixed effects. However, those regression diagnostics are not workable for areg.  My coauthor also suggested that I can run median regressions (qreg) and robust regressions (rreg). He mentioned that these regressions do not allow controlling for firm fixed effects. However, these regressions can be mentioned in the robustness tests section to show that outliers do not affects our analysis.


Thanks and Best,

George


-----原始郵件----- From: David Hoaglin
Sent: Thursday, June 27, 2013 8:31 PM
To: [email protected]
Subject: Re: st: Re: How to delete studentized residuals with absolute values greater than or equal to two after conducting areg procedure?

Dear George,

Assessing "the robustness of the analysis results" usually involves
much more than rerunning the model after removing observations that
the model does not fit well.  Your coauthor should explain the
justification for removing those "outliers."

Whenever possible, one should investigate observations that have large
residuals.  The definition of "studentized residual" is important
here.  Much of the literature on regression diagnostics defines the
studentized residual for observation i as the difference between the
observed value of y for observation i and the value of y predicted for
observation i by the regression model without observation i, divided
by a suitable estimate of the standard deviation of that difference.
Some people use the term "jackknife residual."  The reasoning is that
an observation that is influential may not have a large residual,
because it has distorted the fit.  Sometimes two or more observations
are jointly influential, so that their individual studentized
residuals are not large.  If one can detect such behavior (not always
an easy task), one then removed the whole group of observations (and
tries to understand what is responsible for their behavior).  All this
is part of careful analysis; nothing is automatic.

Earlier you mentioned -reg-, from which you can get the information
you need (in postestimation).  I have seldom used -areg-, but I am not
surprised that it does not give the same detailed information about
individual observations.  It appears that you have some type of panel
data, so the diagnostic process may be more complicated.  You may want
to tell us more about your data.

I hope this discussion helps.

David Hoaglin

On Thu, Jun 27, 2013 at 2:27 AM, George_Huang <[email protected]> wrote:
> Dear David and Peter,
> 
> Thanks for both of your suggestions.  I want to delete studentized residuals
> that have an absolute value greater than or equal to two to delete outliers
> because I want to test the robustness of the analysis results.  This is
> suggested by my coauthor. However, I am more comfortable for deleting the
> outliers by 3 absolute value of studentized residuals as you mentioned. I
> can not find postestimation for studentized residuals after conducing areg
> procedure. If you have further suggesitons, please let me know.
> 
> Thanks a lot,
> 
> George
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/ 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index