Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Regression diagnostics with panel data (-xtreg-)


From   "Tobias Pfaff" <tobias.pfaff@uni-muenster.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Regression diagnostics with panel data (-xtreg-)
Date   Tue, 23 Aug 2011 16:00:05 +0200

Dear Statalisters,

I encounter a few difficulties with regression diagnostics after a fixed
effects regression with panel data (-xtreg, fe-).
Previous threads in Statalist give hints, but in some cases ambiguity
remains. Below, I would follow the splendid structure of UCLA's Stata Web
Book on regression diagnostics
(http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm).
Open questions are marked with "Q:".

I would very much appreciate your input!
I am not a professional in this field, and apart from the questions there
might be some inaccuracies in what is written below.

My panel data set is unbalanced, covers 26 years, 40,000 individuals and
315,000 observations. We cluster standard errors for region*year (-xtreg
..., fe vce(cluster region_svyyear)). We use Stata SE 11.2.

I already looked at -findit test panel- and searched Statalist for
"diagnostics xtreg", "diagnostics panel" and other search terms.

Kind regards,
Tobias

Center for Interdisciplinary Economics
University of Muenster, Germany

**************************************


1. UNUSUAL AND INFLUENTIAL DATA

-predict- after -regress- allows to calculate standardized residuals,
leverage, Cook's D and DFITS which can be used to identify outliers and
influential data. It seems that none of them can be calculated after
-xtreg-.

Q: How can you identify influential observations after a (fixed effects)
panel regression? Is it OK to use -regress- with the same equation and do
the diagnostics with DFITS etc. (as suggested here:
http://www.stata.com/statalist/archive/2006-05/msg00075.html)?

Remedy if assumption is violated: Exclude observations above (or below)
thresholds/cut-off points and check if results change.


2. NORMALITY OF RESIDUALS

The overall error component e can be predicted after -xtreg- (-predict res,
e-).

Q: I guess that it is e that should be normally distributed and not the
fixed error component u?

I would then check for normal distribution of the overall error component
with graphs (-kdensity res, normal-, -pnorm res-, -qnorm res-) and formal
tests (-iqr res-, -jb res-, -sktest res-).

Remedy if assumption is violated: Transform variables. Or use bootstrapping
since this method does not assume normal distribution to calculate correct
standard errors and t-values and check if results change.


3. HOMOSCEDASTICITY

Plot residuals vs. fitted values and check for patterns. See also:
http://www.stata.com/support/faqs/stat/panel.html.

Remedy if assumption is violated: Use robust standard errors, either with
-xtreg, vce(robust)- or -xtreg, vce(cluster ...)-.


4. MULTICOLLINEARITY

Calculating variance inflation factors (VIF) seems to be the standard
approach to check for multicollinearity. Again, -estat vif- is only
available after -regress-, but not after -xtreg-.

It has been suggested to compute case- and time-specific dummies, run
-regress- with all dummies as an equivalent for -xtreg, fe- and then compute
VIFs (http://www.stata.com/statalist/archive/2005-08/msg00018.html).
However, in our panel with several thousand individuals it doesn't seem
appropriate to do -regress- with thousands of dummies.

Another thread suggests that multicollinearity is irrespective of the
dependent variable or the link function
(http://www.stata.com/statalist/archive/2003-12/msg00333.html). Thus, you
could for example use -collin- to calculate VIFs even before using -xtreg-
or any other regression command.

Remedy if assumption is violated: Leave out variables causing
multicollinearity.


5. LINEARITY

I would think that a check for linearity is independent of the regression
method used. If so, then you could test for neglected nonlinearities with
the RESET using -estat ovtest- (or -ivreset- with more options) after
-regress-. And -nlcheck- after -xtreg- might give you more information on
linearity or non-linearity for individual regressors. Graphically, you can
always check scatter plots of the dependent variable and regressors for
linearity. Another graphical method suggested in the UCLA Web Book is an
augmented component-plus-residual plot (-acprplot-) after -regress-.

Q: However, I would think that any (graphical) analysis based on residuals
(such as -acprplot- or -rvpplot-) is sensitive to whether -regress- or
-xtreg, fe- is used. Correct?
 
Remedy if assumption is violated: Transform variables.


6. MODEL SPECIFICATION

I would think that the question if there is an omitted variable or an
irrelevant variable in the model is often more a theoretical one than an
issue which should be tested graphically or formally. But I might be wrong?
The UCLA Web Book suggests -linktest- which works after -regress- but not
after -xtreg-.

Q: Is there a graphical or any other formal test for omitted or irrelevant
variables after -xtreg-?

Remedy if assumption is violated: Exclude or include variables.


7. INDEPENDENCE

Several commands can be used for testing autocorrelation of the error term
with panel data: -xtserial-, -xttest1-, and -pantest2- (see also:
http://www.stata.com/support/faqs/stat/panel.html).

Remedy if assumption is violated: Use -xtregar, fe- to fit fixed effects
model with first-order autoregressive error term.



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index