In the spotlight: Correlated random-effects models: The best of both worlds

Life is full of difficult decisions, and the decisions to be made when modeling panel data are no exception. You can either control for endogeneity with a fixed-effects (FE) model and forgo studying the effects of time-invariant regressors or estimate effects of time-invariant regressors with a random-effects (RE) model but face potential bias. What to do? Enter correlated random-effects (CRE) models.

CRE models estimate both time-varying and time-invariant effects while controlling for endogeneity that arises when regressors are correlated with unobserved panel-level characteristics. In these models, you control for endogeneity by including the panel means of time-varying regressors as additional controls. As it turns out, when you fit this extended regression using the RE estimator, you obtain the same coefficients for time-varying regressors as if you had used an FE estimator! We get the benefits of an FE model without losing information about the time-invariant features of our model.

This powerful result is known as the Mundlak equivalence and has been around for quite some time. Its use in applied research, however, was limited because Mundlak (1978) showed the equivalence only for balanced panels.

This changed with Wooldridge (2019), who extended the Mundlak equivalence to unbalanced panels. By doing so, he also obtained a specification test for unbalanced panels to help choose between RE, FE, and CRE models. This test, unlike Hausman’s, is fully robust to clustered data or heteroskedasticity.

Let me show you how to easily estimate CRE models and run a fully robust specification test using xtreg, cre. The Mundlak specification test is also available after CRE, RE, or FE estimation with the new postestimation command estat mundlak.

CRE in action

Let's say that we would like to study the effect on wages of time-varying regressors, age and tenure, and a time-invariant regressor, race. An FE model will omit any variable that remains constant across time and thus cannot fully answer our research question. An RE model may yield inconsistent estimates because of the possible correlation between individual time-invariant heterogeneity and the regressors age and tenure.

We can use a CRE model to circumvent both problems. Let's see it in action.

. webuse nlswork
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage tenure age i.race, cre vce(cluster idcode)
note: 2.race omitted from xt_means because of collinearity.
note: 3.race omitted from xt_means because of collinearity.

Correlated random-effects regression            Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-squared:                                      Obs per group:
     Within  = 0.1296                                         min =          1
     Between = 0.2346                                         avg =        6.0
     Overall = 0.1890                                         max =         15

                                                Wald chi2(4)      =    1685.18
corr(xit_vars*b, xt_means*γ) = 0.5474           Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,699 clusters in idcode)



                             Robust                                           
     ln_wage   Coefficient  std. err.      z    P>|z|     [95% conf. interval]
   
xit_vars                                                                      
      tenure     .0211313   .0012113    17.44   0.000     .0187572    .0235055
         age     .0121949   .0007414    16.45   0.000     .0107417     .013648
                                                                              
        race                                                                  
      Black     -.1312068   .0117856   -11.13   0.000    -.1543061   -.1081075
      Other      .1059379   .0593177     1.79   0.074    -.0103225    .2221984
                                                                              
       _cons       1.2159   .0306965    39.61   0.000     1.155736    1.276064
   
xt_means                                                                      
      tenure     .0376991    .002281    16.53   0.000     .0332283    .0421698
         age    -.0011984   .0013313    -0.90   0.368    -.0038077    .0014109
                                                                              
        race                                                                  
      Black             0  (omitted)                                          
      Other             0  (omitted)                                          
   
     sigma_u    .33334407                                                     
     sigma_e    .29808194                                                     
         rho    .55567161   (fraction of variance due to u_i)                 

Mundlak test (xt_means = 0): chi2(2) = 331.5144            Prob > chi2 = 0.0000

CRE models include the panel means as additional regressors to control for potential endogeneity and correct bias. We see this in the output above: the first block of output shows the coefficients and related statistics for the variables in the model (xit_vars), and the second block shows their respective panel means (xt_means). Notice that the time-invariant regressor, race, is omitted from this latter group.

At the bottom of the output, we see results from the Mundlak specification test. Unlike a Hausman test, this test is fully robust and remains valid even when a robust standard error is estimated, as we have done here. In our example, the test provides strong evidence in favor of the fitted CRE model.

To verify the Mundlak equivalence, let’s check that this procedure gives us the same coefficients as the corresponding FE model for the time-varying regressors.

. xtreg ln_wage tenure age i.race, fe vce(cluster idcode)
note: 2.race omitted because of collinearity.
note: 3.race omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-squared:                                      Obs per group:
     Within  = 0.1296                                         min =          1
     Between = 0.1916                                         avg =        6.0
     Overall = 0.1456                                         max =         15

                                                F(2, 4698)        =     766.79
corr(u_i, Xb) = 0.1302                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,699 clusters in idcode)



                             Robust                                           
     ln_wage   Coefficient  std. err.      t    P>|t|     [95% conf. interval]
   
      tenure     .0211313   .0012112    17.45   0.000     .0187568    .0235059
         age     .0121949   .0007414    16.45   0.000     .0107414    .0136483
                                                                              
        race                                                                  
      Black             0  (omitted)                                          
      Other             0  (omitted)                                          
                                                                              
       _cons     1.256467   .0194187    64.70   0.000     1.218397    1.294537
   
     sigma_u    .39034493                                                     
     sigma_e    .29808194                                                     
         rho    .63165531   (fraction of variance due to u_i)

With the CRE model, we get the benefits of an FE model but do not lose information about time-invariant features of our model. It's the best of both worlds!

To perform the Mundlak specification test after an FE or RE model, you can use the estat mundlak postestimation command.

. estat mundlak

Mundlak specification test
H0: Covariates are uncorrelated with unobserved panel-level effects

    chi2(2) = 331.51
Prob > chi2 = 0.0000

Notes: Fixed effects and correlated random effects are
       consistent under H0 and Ha.
       Random effects are efficient under H0.

We get the same results we obtained previously with the CRE model, again providing strong evidence in favor of the FE or CRE model, depending on whether we are also interested in estimating time-invariant effects.

Parting words

StataNow™ added CRE models for panel data with xtreg’s new estimation option cre. These models give you the best of both worlds: they allow you to control for endogeneity and study the effects of time-invariant regressors at the same time. Moreover, you get a fully robust Mundlak specification test for free. To learn more, see [XT] xtreg and [XT] xtreg postestimation.

References

Mundlak, Y. 1978. On the pooling of time series and cross section data. Econometrica 46: 69–85.

Wooldridge, J. M. 2019. Correlated random effects models with unbalanced panels. Journal of Econometrics 211: 137–150.

— Eduardo García Echeverri
Senior Econometrician

«Back to main page