Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
vwiggins@stata.com (Vince Wiggins, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: SUREG with if command. |

Date |
Tue, 25 Sep 2012 21:49:59 -0500 |

David Ashcraft <ashcraftd@rocketmail.com> asked questions about how -sureg- handles incomplete cases. The bottom line is that -sureg- -- seemingly unrelated regression (SUREG) -- uses casewise deletion, wherein the variables for an observation must have nonmissing observations for all equations in the system. -sureg- is a generalized least squares (GLS) estimator, and casewise deletion is an efficient method for such estimators. Read on if you are interested in more details and a way to include all observations using a maximum likelihood estimator (MLE). In a series of followup posts, Mark Schaffer <M.E.Schaffer@hw.ac.uk> provided an example using the auto dataset that clearly illustrates what -sureg- does. I am going to piece together Mark's posts in a way that makes explanation easier. > I think David is right, and in this case -sureg- is not doing the > best it can. > > Here's an example with the toy auto dataset. There are 69 obs for > rep78 and 74 obs for everything else. In the following example, > > . sureg (mpg rep78) (trunk turn) > > Seemingly unrelated regression > ---------------------------------------------------------------------- > Equation Obs Parms RMSE "R-sq" chi2 P > ---------------------------------------------------------------------- > mpg 69 1 5.333491 0.1613 15.86 0.0001 > trunk 69 1 3.48627 0.3462 26.22 0.0000 > ---------------------------------------------------------------------- > ... > > -sureg- could be using all 74 observations for the trunk equation, > but it's using only 69. Mark goes on to say, > It is indeed possible in principle to use the additional obs. > -sureg- and -reg3- are estimating the error components for a 2-eqn > model, so the estimated covariance matrix is 2x2: > > . qui reg3 (mpg rep78) (trunk turn), ols > > . mat list e(Sigma) > > symmetric e(Sigma)[2,2] > mpg trunk > mpg 29.274269 > trunk -5.0326348 12.233795 > > The above uses OLS appled to 69 obs for both equations. If we use > -regress- to estimate the mpg eqn, where only 69 obs are available, > we get the same error variance: > > . qui reg mpg rep78 > > . di e(rmse)^2 > 29.274269 > > But -regress- applied to the trunk equation on its own uses all 74 > obs, and so the error variance is different (and, since it uses more > obs, preferable): > > . qui reg trunk turn > > . di e(rmse)^2 > 11.848587 > > In principle -sureg-/-reg3- should use the additional 5 obs when > estimating the trunk equation. Not to do so is throwing away > information. The fly in the ointment with this approach is that it does not tell us how to estimate the covariance between mpg and trunk (the off-diagonal element in e(Sigma) above). Do we use only the residuals where both equations can be estimated? The problem with that is that the resulting e(Sigma) need not be positive definite and the standard GLS estimator for SUREG requires that e(Sigma) be invertible. This problem was studied many years ago with regard to estimating any multivariate covariance or correlation matrix. (I'm away from the office and the references are not close to hand.) Alternate estimators of the covariance term were also considered. The surprising thing is that with a GLS estimator of e(Sigma), casewise deletion is just as efficient as any method of trying to include more data. The only way to do better is to use an MLE estimator of the system. Mark goes on to note, > It's even clearer with -reg3- (IIRC, -sureg- is implemented using > -reg3-). If you use -reg3- with the ols option, > > . reg3 (mpg rep78) (trunk turn), ols > > Multivariate regression > ---------------------------------------------------------------------- > Equation Obs Parms RMSE "R-sq" F-Stat P > ---------------------------------------------------------------------- > mpg 69 1 5.41057 0.1619 12.94 0.0005 > trunk 69 1 3.497684 0.3610 37.84 0.0000 > ---------------------------------------------------------------------- > ... > > -reg3- again uses only 69 obs for the trunk equation even though there > are 74 available and it's being asked to do OLS only. We went to some effort to be sure that -reg3- and -sureg- used the same observations regardless of which estimator was used, and therefore whether e(Sigma) need be estimated. We consider that a feature. What are we to do if we want to use all of the observations? Use an MLE estimator of the system -- -sem-. . sem (mpg <- rep78) (trunk <- turn), cov(e.mpg*e.trunk) will estimate the system by MLE, but still uses casewise deletion. Note that we needed to explicitly request that the covariance of the residuals be estimated, option -cov(e.mpg*e.trunk)-, otherwise the equations would have been assumed to be conditionally independent. To use all the data, we need to specify the option for maximum likelihood with missing values, -method(mlmv)-, . sem (mpg <- rep78) (trunk <- turn), cov(e.mpg*e.trunk) method(mlmv) -- Vince vwiggins@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: 3-level model for crime victimization** - Next by Date:
**st: sampling weight** - Previous by thread:
**Re: st: SUREG with if command.** - Next by thread:
**Re: st: SUREG with if command.** - Index(es):