In the spotlight: Correlated random-effects models: The best of both worlds
Life is full of difficult decisions, and the decisions to be made when modeling panel data are no exception. You can either control for endogeneity with a fixed-effects (FE) model and forgo studying the effects of time-invariant regressors or estimate effects of time-invariant regressors with a random-effects (RE) model but face potential bias. What to do? Enter correlated random-effects (CRE) models.
CRE models estimate both time-varying and time-invariant effects while controlling for endogeneity that arises when regressors are correlated with unobserved panel-level characteristics. In these models, you control for endogeneity by including the panel means of time-varying regressors as additional controls. As it turns out, when you fit this extended regression using the RE estimator, you obtain the same coefficients for time-varying regressors as if you had used an FE estimator! We get the benefits of an FE model without losing information about the time-invariant features of our model.
This powerful result is known as the Mundlak equivalence and has been around for quite some time. Its use in applied research, however, was limited because Mundlak (1978) showed the equivalence only for balanced panels.
This changed with Wooldridge (2019), who extended the Mundlak equivalence to unbalanced panels. By doing so, he also obtained a specification test for unbalanced panels to help choose between RE, FE, and CRE models. This test, unlike Hausman’s, is fully robust to clustered data or heteroskedasticity.
Let me show you how to easily estimate CRE models and run a fully robust specification test using xtreg, cre. The Mundlak specification test is also available after CRE, RE, or FE estimation with the new postestimation command estat mundlak.
CRE in action
Let's say that we would like to study the effect on wages of time-varying regressors, age and tenure, and a time-invariant regressor, race. An FE model will omit any variable that remains constant across time and thus cannot fully answer our research question. An RE model may yield inconsistent estimates because of the possible correlation between individual time-invariant heterogeneity and the regressors age and tenure.
We can use a CRE model to circumvent both problems. Let's see it in action.
. webuse nlswork (National Longitudinal Survey of Young Women, 14-24 years old in 1968) . xtreg ln_wage tenure age i.race, cre vce(cluster idcode) note: 2.race omitted from xt_means because of collinearity. note: 3.race omitted from xt_means because of collinearity. Correlated random-effects regression Number of obs = 28,101 Group variable: idcode Number of groups = 4,699 R-squared: Obs per group: Within = 0.1296 min = 1 Between = 0.2346 avg = 6.0 Overall = 0.1890 max = 15 Wald chi2(4) = 1685.18 corr(xit_vars*b, xt_means*γ) = 0.5474 Prob > chi2 = 0.0000 (Std. err. adjusted for 4,699 clusters in idcode)
Robust | ||
ln_wage | Coefficient std. err. z P>|z| [95% conf. interval] | |
xit_vars | ||
tenure | .0211313 .0012113 17.44 0.000 .0187572 .0235055 | |
age | .0121949 .0007414 16.45 0.000 .0107417 .013648 | |
race | ||
Black | -.1312068 .0117856 -11.13 0.000 -.1543061 -.1081075 | |
Other | .1059379 .0593177 1.79 0.074 -.0103225 .2221984 | |
_cons | 1.2159 .0306965 39.61 0.000 1.155736 1.276064 | |
xt_means | ||
tenure | .0376991 .002281 16.53 0.000 .0332283 .0421698 | |
age | -.0011984 .0013313 -0.90 0.368 -.0038077 .0014109 | |
race | ||
Black | 0 (omitted) | |
Other | 0 (omitted) | |
sigma_u | .33334407 | |
sigma_e | .29808194 | |
rho | .55567161 (fraction of variance due to u_i) | |
CRE models include the panel means as additional regressors to control for potential endogeneity and correct bias. We see this in the output above: the first block of output shows the coefficients and related statistics for the variables in the model (xit_vars), and the second block shows their respective panel means (xt_means). Notice that the time-invariant regressor, race, is omitted from this latter group.
At the bottom of the output, we see results from the Mundlak specification test. Unlike a Hausman test, this test is fully robust and remains valid even when a robust standard error is estimated, as we have done here. In our example, the test provides strong evidence in favor of the fitted CRE model.
To verify the Mundlak equivalence, let’s check that this procedure gives us the same coefficients as the corresponding FE model for the time-varying regressors.
. xtreg ln_wage tenure age i.race, fe vce(cluster idcode) note: 2.race omitted because of collinearity. note: 3.race omitted because of collinearity. Fixed-effects (within) regression Number of obs = 28,101 Group variable: idcode Number of groups = 4,699 R-squared: Obs per group: Within = 0.1296 min = 1 Between = 0.1916 avg = 6.0 Overall = 0.1456 max = 15 F(2, 4698) = 766.79 corr(u_i, Xb) = 0.1302 Prob > F = 0.0000 (Std. err. adjusted for 4,699 clusters in idcode)
Robust | ||
ln_wage | Coefficient std. err. t P>|t| [95% conf. interval] | |
tenure | .0211313 .0012112 17.45 0.000 .0187568 .0235059 | |
age | .0121949 .0007414 16.45 0.000 .0107414 .0136483 | |
race | ||
Black | 0 (omitted) | |
Other | 0 (omitted) | |
_cons | 1.256467 .0194187 64.70 0.000 1.218397 1.294537 | |
sigma_u | .39034493 | |
sigma_e | .29808194 | |
rho | .63165531 (fraction of variance due to u_i) | |
With the CRE model, we get the benefits of an FE model but do not lose information about time-invariant features of our model. It's the best of both worlds!
To perform the Mundlak specification test after an FE or RE model, you can use the estat mundlak postestimation command.
. estat mundlak Mundlak specification test H0: Covariates are uncorrelated with unobserved panel-level effects chi2(2) = 331.51 Prob > chi2 = 0.0000 Notes: Fixed effects and correlated random effects are consistent under H0 and Ha. Random effects are efficient under H0.
We get the same results we obtained previously with the CRE model, again providing strong evidence in favor of the FE or CRE model, depending on whether we are also interested in estimating time-invariant effects.
Parting words
StataNow™ added CRE models for panel data with xtreg’s new estimation option cre. These models give you the best of both worlds: they allow you to control for endogeneity and study the effects of time-invariant regressors at the same time. Moreover, you get a fully robust Mundlak specification test for free. To learn more, see [XT] xtreg and [XT] xtreg postestimation.
References
Mundlak, Y. 1978. On the pooling of time series and cross section data. Econometrica 46: 69–85.
Wooldridge, J. M. 2019. Correlated random effects models with unbalanced panels. Journal of Econometrics 211: 137–150.
— Eduardo García Echeverri
Senior Econometrician