Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Tobias Pfaff" <tobias.pfaff@uni-muenster.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Regression diagnostics with panel data (-xtreg-) |

Date |
Tue, 23 Aug 2011 16:00:05 +0200 |

Dear Statalisters, I encounter a few difficulties with regression diagnostics after a fixed effects regression with panel data (-xtreg, fe-). Previous threads in Statalist give hints, but in some cases ambiguity remains. Below, I would follow the splendid structure of UCLA's Stata Web Book on regression diagnostics (http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm). Open questions are marked with "Q:". I would very much appreciate your input! I am not a professional in this field, and apart from the questions there might be some inaccuracies in what is written below. My panel data set is unbalanced, covers 26 years, 40,000 individuals and 315,000 observations. We cluster standard errors for region*year (-xtreg ..., fe vce(cluster region_svyyear)). We use Stata SE 11.2. I already looked at -findit test panel- and searched Statalist for "diagnostics xtreg", "diagnostics panel" and other search terms. Kind regards, Tobias Center for Interdisciplinary Economics University of Muenster, Germany ************************************** 1. UNUSUAL AND INFLUENTIAL DATA -predict- after -regress- allows to calculate standardized residuals, leverage, Cook's D and DFITS which can be used to identify outliers and influential data. It seems that none of them can be calculated after -xtreg-. Q: How can you identify influential observations after a (fixed effects) panel regression? Is it OK to use -regress- with the same equation and do the diagnostics with DFITS etc. (as suggested here: http://www.stata.com/statalist/archive/2006-05/msg00075.html)? Remedy if assumption is violated: Exclude observations above (or below) thresholds/cut-off points and check if results change. 2. NORMALITY OF RESIDUALS The overall error component e can be predicted after -xtreg- (-predict res, e-). Q: I guess that it is e that should be normally distributed and not the fixed error component u? I would then check for normal distribution of the overall error component with graphs (-kdensity res, normal-, -pnorm res-, -qnorm res-) and formal tests (-iqr res-, -jb res-, -sktest res-). Remedy if assumption is violated: Transform variables. Or use bootstrapping since this method does not assume normal distribution to calculate correct standard errors and t-values and check if results change. 3. HOMOSCEDASTICITY Plot residuals vs. fitted values and check for patterns. See also: http://www.stata.com/support/faqs/stat/panel.html. Remedy if assumption is violated: Use robust standard errors, either with -xtreg, vce(robust)- or -xtreg, vce(cluster ...)-. 4. MULTICOLLINEARITY Calculating variance inflation factors (VIF) seems to be the standard approach to check for multicollinearity. Again, -estat vif- is only available after -regress-, but not after -xtreg-. It has been suggested to compute case- and time-specific dummies, run -regress- with all dummies as an equivalent for -xtreg, fe- and then compute VIFs (http://www.stata.com/statalist/archive/2005-08/msg00018.html). However, in our panel with several thousand individuals it doesn't seem appropriate to do -regress- with thousands of dummies. Another thread suggests that multicollinearity is irrespective of the dependent variable or the link function (http://www.stata.com/statalist/archive/2003-12/msg00333.html). Thus, you could for example use -collin- to calculate VIFs even before using -xtreg- or any other regression command. Remedy if assumption is violated: Leave out variables causing multicollinearity. 5. LINEARITY I would think that a check for linearity is independent of the regression method used. If so, then you could test for neglected nonlinearities with the RESET using -estat ovtest- (or -ivreset- with more options) after -regress-. And -nlcheck- after -xtreg- might give you more information on linearity or non-linearity for individual regressors. Graphically, you can always check scatter plots of the dependent variable and regressors for linearity. Another graphical method suggested in the UCLA Web Book is an augmented component-plus-residual plot (-acprplot-) after -regress-. Q: However, I would think that any (graphical) analysis based on residuals (such as -acprplot- or -rvpplot-) is sensitive to whether -regress- or -xtreg, fe- is used. Correct? Remedy if assumption is violated: Transform variables. 6. MODEL SPECIFICATION I would think that the question if there is an omitted variable or an irrelevant variable in the model is often more a theoretical one than an issue which should be tested graphically or formally. But I might be wrong? The UCLA Web Book suggests -linktest- which works after -regress- but not after -xtreg-. Q: Is there a graphical or any other formal test for omitted or irrelevant variables after -xtreg-? Remedy if assumption is violated: Exclude or include variables. 7. INDEPENDENCE Several commands can be used for testing autocorrelation of the error term with panel data: -xtserial-, -xttest1-, and -pantest2- (see also: http://www.stata.com/support/faqs/stat/panel.html). Remedy if assumption is violated: Use -xtregar, fe- to fit fixed effects model with first-order autoregressive error term. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Count data interaction terms** - Next by Date:
**Re: st: Count data interaction terms** - Previous by thread:
**Re: st: pasting excel worsheet with missing values in Stata 12 - Bug?** - Next by thread:
**RE: st: Regression diagnostics with panel data (-xtreg-)** - Index(es):