 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: re:st: Negative R-squared in IV estimation

 From "Lim, Elizabeth" To "statalist@hsphsun2.harvard.edu" Subject RE: re:st: Negative R-squared in IV estimation Date Mon, 24 Oct 2011 22:20:32 +0000

Thank you so much to Kit and Yuval for helping me out on the questions below! Your responses were helpful to me, and I am grateful for the opportunity to learn from you!

Best,
Elizabeth

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Christopher Baum
Sent: Friday, October 21, 2011 3:05 PM
To: statalist@hsphsun2.harvard.edu
Subject: re:st: Negative R-squared in IV estimation

<>
Elisabeth said

I'm hoping someone might be able to shed some light on the following issues that I've been struggling with:-

(1) As Wooldridge (2006) mentioned in his textbook, "Unlike in the case of OLS, the R-squared from IV estimation can be negative because SSR for IV can actually be larger than SST. Although it does not really hurt to report the R-squared for IV estimation, it is not very useful, either" (p. 521).  How do I deal with negative R-squared in 2SLS?  If it's not "useful" to report R-squared, especially negative R-squareds, what model statistics should I report?

(2) Wooldridge (2006) further explained that "R-squareds cannot be used in the usual way to compute F tests of joint restrictions" (p.521).   If I want to report model F values in lieu of R-squareds, how do I do compute F values based on R-squared values?  What formula do I use?

(3) My understanding is that Model F values in OLS should increase with the addition of more variables in the model, but I'm not sure if the same interpretation applies in 2SLS models.  If the Model F value in 2SLS models *decreases* after adding interaction terms, what would this suggest?  Is there any cause for concern?

(4) Suppose I run a 2SLS, and all the coefficients and standard errors for all the variables in the 2SLS model are less than 1, but the coefficient estimates and standard errors on the interaction terms are large (by large, I mean in excess of 1). Is this an indication of some statistical or econometrics problem?  What might have caused the large values in the estimates and standard errors of the interaction term?  What can I do to check whether I've run the 2SLS analysis correctly?

(1) R^2 can be negative in any IV estimation because the least squares solution finds the minimum SS resides corresponding to the instruments rather than the original regressors. [That is why doing 2SLS by hand' is dangerous, in that you end up with the wrong residuals in the second stage). In that context it is a largely meaningless statistic in an IV context. I would report the Root MSE (std. error of regression) and the p-val from the overid ("Sargan" or "Hansen") statistic.

(2) Don't try. Stata will compute the appropriate F-stats using the test command. The formula that lets you express F in terms of R^2 and v.v. is only appropriate in the special case of OLS with a constant term and i.i.d. errors. As you can see when you use robust SE, the ANOVA table (from which the F is computed in the standard case) is no longer provided.

(3) I don't see the intuition here. In that standard F-stat case, the numerator is the mean squares due to regression. Every time you add a regressor, the SSE (explained SS) is likely to rise (and cannot fall), but you are dividing it by a larger number of regressors. At the same time, the SSR (residual SS) must fall (or at least not rise), but the number of df you are dividing by falls by 1. Thus it is not obvious what should be happening to the F stat as you look at a 'longer' model. In my mind the only thing the F-stat value is good for is giving you a value large enough to reject its null. So even in OLS, don't worry about its value; worry about its significance. Same goes for 2SLS.

(4) The magnitude of coefficients in linear regression (including 2SLS) are a function of the magnitude of their respective regressors. Those which are greater than 1 could be made smaller than 1 by just scaling the regressor. Thus, their magnitudes have no natural meaning; they are just estimates of \partial y / \partial x for the particular y and x.

To check whether you have run the 2SLS correctly, along the lines of B-S-S SJ 2003,2007, check the overid stat (presuming that your model is overid), use the appropriate VCE if you don't believe in i.i.d. errors (and who does?!?), etc.

Kit

Kit Baum   |   Boston College Economics & DIW Berlin   |   http://ideas.repec.org/e/pba1.html
An Introduction to Stata Programming  |   http://www.stata-press.com/books/isp.html
An Introduction to Modern Econometrics Using Stata  |   http://www.stata-press.com/books/imeus.html

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
`