Search
   >> Home >> Resources & support >> FAQs >> Fixed-, between-, and random-effects and xtreg

Is there more information on xtreg?

Title   Fixed-, between-, and random-effects and xtreg
Author James Hardin, StataCorp
Date October 1996; revisions April 2011

The xtreg commands are for dealing with longitudinal or panel data. You have data

    yit  =  alpha  +  XB  + ui  + eit

for i units, i=1,...,n measured at times t=1,...,Ti (where we use Ti instead of T to allow for unbalanced data).

For expository purposes, let’s say that we have data of the following form:

    y      Response variable
    x1     1st predictor
    x2     2nd predictor
    x3     3rd predictor
    x4     4th predictor
    id     Categorical variable denoting unit (has 100 different values)

The simplest model that we could run is to ignore the fact that we have repeated measures on units and simply ask for

 . regress y x1 x2 x3 x4
 
   Source |       SS       df       MS                  Number of obs =     500
 ---------+------------------------------               F(  4,   495) =   20.05
    Model |  152.436199     4  38.1090499               Prob > F      =  0.0000
 Residual |  941.073278   495  1.90115814               R-squared     =  0.1394
 ---------+------------------------------               Adj R-squared =  0.1324
    Total |  1093.50948   499  2.19140176               Root MSE      =  1.3788
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   .6628054   .2155956      3.074   0.002         .23921    1.086401
       x2 |   .3696673   .1752119      2.110   0.035       .0254167     .713918
       x3 |   .8149278   .2021136      4.032   0.000       .4178216    1.212034
       x4 |   .9284717   .1455638      6.378   0.000       .6424726    1.214471
    _cons |   .2567164   .0622353      4.125   0.000       .1344385    .3789943
 ------------------------------------------------------------------------------

Fixed-effects model

However, we would like to use the information that certain sets of observations came from certain units. In a fixed-effects model, we are interested in the coefficients B and alpha and we assume that the ui are fixed quantities. We model

    yit = (alpha+ui) + XB + eit

allowing for different intercepts for our units, but constraining the slopes to be the same across units.

In xtreg notation, this model is fit specifying

 . xtreg y x1 x2 x3 x4, i(id) fe
 
                                              Fixed-effects (within) regression
 sd(u_id)                     =  1.096278               Number of obs =     500
 sd(e_id_t)                   =  .9693631                           n =     100
 sd(e_id_t + u_id)            =  1.463383                           T =       5
 
 corr(u_id, Xb)               =    0.1221               R-sq within   =  0.1299
                                                             between  =  0.1581
                                                             overall  =  0.1238
 
                                                        F(  4,   396) =   14.79
                                                             Prob > F =  0.0000
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   .6964505   .1665487      4.182   0.000       .3690204    1.023881
       x2 |    .509578   .1358135      3.752   0.000       .2425723    .7765837
       x3 |   .4204604   .1681315      2.501   0.013       .0899186    .7510023
       x4 |   .5181118   .1178406      4.397   0.000       .2864405    .7497832
    _cons |   .2869807   .0439059      6.536   0.000       .2006628    .3732986
 ------------------------------------------------------------------------------
       id |           F(99,396) =      6.116   0.000           (100 categories)

and can also be obtained using areg by specifying

 . areg y x1 x2 x3 x4, absorb(id)
  
   Source |       SS       df       MS                  Number of obs =     500
 ---------+------------------------------               F(103,   396) =    7.45
    Model | 721.4021756   103 7.003904617               Prob > F      =  0.0000
 Residual | 372.1073022   396 .9396649046               R-squared     =  0.6597
 ---------+------------------------------               Adj R-squared =  0.5712
    Total | 1093.509478   499 2.191401759               Root MSE      = .969363
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   .6964505   .1665487      4.182   0.000       .3690204    1.023881
       x2 |    .509578   .1358135      3.752   0.000       .2425723    .7765837
       x3 |   .4204604   .1681315      2.501   0.013       .0899186    .7510023
       x4 |   .5181118   .1178406      4.397   0.000       .2864405    .7497832
    _cons |   .2869807   .0439059      6.536   0.000       .2006628    .3732986
 ------------------------------------------------------------------------------
       id |           F(99,396) =      6.116   0.000            (100 categories)

In the output of areg, the overall F test is for the model including the dummy variables (even though we absorbed them and do not see the estimated coefficients). In the fixed-effects model of xtreg, we present the overall F test only of those terms in which we are interested (the ones that are listed in the output) instead of showing the F test including the absorbed terms.

Finally, you could also generate the dummy variables yourself and run the regression using regress:

 . quietly tab id, gen(dum)

 . regress y x1 x2 x3 x4 dum1-dum99
  
   Source |       SS       df       MS                  Number of obs =     500
 ---------+------------------------------               F(103,   396) =    7.45
    Model |  721.402175   103  7.00390461               Prob > F      =  0.0000
 Residual |  372.107303   396  .939664907               R-squared     =  0.6597
 ---------+------------------------------               Adj R-squared =  0.5712
    Total |  1093.50948   499  2.19140176               Root MSE      =  .96936
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   .6964505   .1665487      4.182   0.000       .3690204    1.023881
       x2 |    .509578   .1358135      3.752   0.000       .2425723    .7765837
       x3 |   .4204604   .1681315      2.501   0.013       .0899186    .7510023
       x4 |   .5181118   .1178406      4.397   0.000       .2864405    .7497832
     dum1 |   -.973649   .6147949     -1.584   0.114      -2.182319    .2350209
     dum2 |   .9924268    .616741      1.609   0.108      -.2200691    2.204923
 ...
    dum98 |  -3.691905   .6175698     -5.978   0.000       -4.90603    -2.47778
    dum99 |   .7967596    .616487      1.292   0.197      -.4152368    2.008756
    _cons |   .0237916   .4362536      0.055   0.957       -.833871    .8814541
 ------------------------------------------------------------------------------

where the test associated with id is the test that all of the dummy variables are zero.

Note:   If we were to specify the dummies for 1 through n, regress would drop one of them because of collinearity.

Between-effects model

In the between-effects model, we attempt to model the mean response where the means are calculated for each of the units. To use this model, you should have many units.

In xtreg notation, you may specify this model as

 . xtreg y x1 x2 x3 x4, i(id) be
 
                                                          Between-id regression
                                                        Number of obs =     500
                                                                    n =     100
                                                                    T =       5
 
                                                        R-sq within   =  0.0627
                                                             between  =  0.2444
                                                             overall  =  0.1145
 
 sd(u_id + e_id) =  1.02915                             F(  4,    95) =    7.68
    where  e_id  = avg(e_id_t)                               Prob > F =  0.0000
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   .6094896   .8923401      0.683   0.496      -1.162029    2.381009
       x2 |  -.4360071   .7018847     -0.621   0.536      -1.829424    .9574101
       x3 |    1.37073   .6650347      2.061   0.042       .0504691    2.690991
       x4 |   2.101814   .5036762      4.173   0.000        1.10189    3.101738
    _cons |   .1742767   .1066317      1.634   0.105      -.0374139    .3859674
 ------------------------------------------------------------------------------

or you can fit this model by calculating the values yourself using

 . collapse y x1 x2 x3 x4, by(id) mean(y x1 x2 x3 x4)
 . reg y x1 x2 x3 x4
 
   Source |       SS       df       MS                  Number of obs =     100
 ---------+------------------------------               F(  4,    95) =    7.68
    Model |  32.5465229     4  8.13663074               Prob > F      =  0.0000
 Residual |  100.619272    95  1.05915023               R-squared     =  0.2444
 ---------+------------------------------               Adj R-squared =  0.2126
    Total |  133.165795    99  1.34510904               Root MSE      =  1.0292
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   .6094896   .8923401      0.683   0.496      -1.162029    2.381009
       x2 |  -.4360071   .7018847     -0.621   0.536      -1.829424    .9574101
       x3 |    1.37073   .6650347      2.061   0.042       .0504691    2.690991
       x4 |   2.101814   .5036762      4.173   0.000        1.10189    3.101738
    _cons |   .1742767   .1066317      1.634   0.105      -.0374139    .3859674
 ------------------------------------------------------------------------------

Random-effects model

In the random-effects model, we are taking a weighted average of the fixed and between estimates. People generally want to use the random-effects model because they wish to estimate the variables that are constant within unit. The GLS random-effects model xtreg, re and the ML random-effects model xtreg, mle both require that we treat the ui terms as random variables and assume that there is no correlation between ui and X.

The ML random-effects method xtreg, mle also requires that ui follow the normal distribution. This distributional assumption should not be taken lightly because sometimes this assumption may not be valid.

You may be wondering what this means given that the dummy variables are not included in the output of the random-effects models. They are, however, available using predict with the u option. You can compare these values with the estimates that you get from the fixed-effects model where you include all of the dummy variables.

For the between-effects model, we are talking about the averages of the units over time. Why isn’t the ui term equal to zero, given that it is a mean of the residual?

In the between-effects model, the mean of the ui terms is zero, but the individual terms are not necessarily zero.

What is the xttest0 command testing?

It is a test of sd(ui) = 0, where sd(ui) is the standard deviation of the ui terms. If this is true, then there is no within-unit correlation.

The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube