Search
>> Home >> Resources & support >> FAQs >> Fixed-, between-, and random-effects and xtreg

 Title Fixed-, between-, and random-effects and xtreg Author James Hardin, StataCorp Date October 1996; revisions April 2011

The xtreg commands are for dealing with longitudinal or panel data. You have data

    yit  =  alpha  +  XB  + ui  + eit


for i units, i=1,...,n measured at times t=1,...,Ti (where we use Ti instead of T to allow for unbalanced data).

For expository purposes, let’s say that we have data of the following form:

    y      Response variable
x1     1st predictor
x2     2nd predictor
x3     3rd predictor
x4     4th predictor
id     Categorical variable denoting unit (has 100 different values)


The simplest model that we could run is to ignore the fact that we have repeated measures on units and simply ask for

 . regress y x1 x2 x3 x4

Source |       SS       df       MS                  Number of obs =     500
---------+------------------------------               F(  4,   495) =   20.05
Model |  152.436199     4  38.1090499               Prob > F      =  0.0000
Residual |  941.073278   495  1.90115814               R-squared     =  0.1394
Total |  1093.50948   499  2.19140176               Root MSE      =  1.3788

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 |   .6628054   .2155956      3.074   0.002         .23921    1.086401
x2 |   .3696673   .1752119      2.110   0.035       .0254167     .713918
x3 |   .8149278   .2021136      4.032   0.000       .4178216    1.212034
x4 |   .9284717   .1455638      6.378   0.000       .6424726    1.214471
_cons |   .2567164   .0622353      4.125   0.000       .1344385    .3789943
------------------------------------------------------------------------------


### Fixed-effects model

However, we would like to use the information that certain sets of observations came from certain units. In a fixed-effects model, we are interested in the coefficients B and alpha and we assume that the ui are fixed quantities. We model

    yit = (alpha+ui) + XB + eit


allowing for different intercepts for our units, but constraining the slopes to be the same across units.

In xtreg notation, this model is fit specifying

 . xtreg y x1 x2 x3 x4, i(id) fe

Fixed-effects (within) regression
sd(u_id)                     =  1.096278               Number of obs =     500
sd(e_id_t)                   =  .9693631                           n =     100
sd(e_id_t + u_id)            =  1.463383                           T =       5

corr(u_id, Xb)               =    0.1221               R-sq within   =  0.1299
between  =  0.1581
overall  =  0.1238

F(  4,   396) =   14.79
Prob > F =  0.0000

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 |   .6964505   .1665487      4.182   0.000       .3690204    1.023881
x2 |    .509578   .1358135      3.752   0.000       .2425723    .7765837
x3 |   .4204604   .1681315      2.501   0.013       .0899186    .7510023
x4 |   .5181118   .1178406      4.397   0.000       .2864405    .7497832
_cons |   .2869807   .0439059      6.536   0.000       .2006628    .3732986
------------------------------------------------------------------------------
id |           F(99,396) =      6.116   0.000           (100 categories)


and can also be obtained using areg by specifying

 . areg y x1 x2 x3 x4, absorb(id)

Source |       SS       df       MS                  Number of obs =     500
---------+------------------------------               F(103,   396) =    7.45
Model | 721.4021756   103 7.003904617               Prob > F      =  0.0000
Residual | 372.1073022   396 .9396649046               R-squared     =  0.6597
Total | 1093.509478   499 2.191401759               Root MSE      = .969363

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 |   .6964505   .1665487      4.182   0.000       .3690204    1.023881
x2 |    .509578   .1358135      3.752   0.000       .2425723    .7765837
x3 |   .4204604   .1681315      2.501   0.013       .0899186    .7510023
x4 |   .5181118   .1178406      4.397   0.000       .2864405    .7497832
_cons |   .2869807   .0439059      6.536   0.000       .2006628    .3732986
------------------------------------------------------------------------------
id |           F(99,396) =      6.116   0.000            (100 categories)


In the output of areg, the overall F test is for the model including the dummy variables (even though we absorbed them and do not see the estimated coefficients). In the fixed-effects model of xtreg, we present the overall F test only of those terms in which we are interested (the ones that are listed in the output) instead of showing the F test including the absorbed terms.

Finally, you could also generate the dummy variables yourself and run the regression using regress:

 . quietly tab id, gen(dum)

. regress y x1 x2 x3 x4 dum1-dum99

Source |       SS       df       MS                  Number of obs =     500
---------+------------------------------               F(103,   396) =    7.45
Model |  721.402175   103  7.00390461               Prob > F      =  0.0000
Residual |  372.107303   396  .939664907               R-squared     =  0.6597
Total |  1093.50948   499  2.19140176               Root MSE      =  .96936

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 |   .6964505   .1665487      4.182   0.000       .3690204    1.023881
x2 |    .509578   .1358135      3.752   0.000       .2425723    .7765837
x3 |   .4204604   .1681315      2.501   0.013       .0899186    .7510023
x4 |   .5181118   .1178406      4.397   0.000       .2864405    .7497832
dum1 |   -.973649   .6147949     -1.584   0.114      -2.182319    .2350209
dum2 |   .9924268    .616741      1.609   0.108      -.2200691    2.204923
...
dum98 |  -3.691905   .6175698     -5.978   0.000       -4.90603    -2.47778
dum99 |   .7967596    .616487      1.292   0.197      -.4152368    2.008756
_cons |   .0237916   .4362536      0.055   0.957       -.833871    .8814541
------------------------------------------------------------------------------


where the test associated with id is the test that all of the dummy variables are zero.

Note:   If we were to specify the dummies for 1 through n, regress would drop one of them because of collinearity.

### Between-effects model

In the between-effects model, we attempt to model the mean response where the means are calculated for each of the units. To use this model, you should have many units.

In xtreg notation, you may specify this model as

 . xtreg y x1 x2 x3 x4, i(id) be

Between-id regression
Number of obs =     500
n =     100
T =       5

R-sq within   =  0.0627
between  =  0.2444
overall  =  0.1145

sd(u_id + e_id) =  1.02915                             F(  4,    95) =    7.68
where  e_id  = avg(e_id_t)                               Prob > F =  0.0000

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 |   .6094896   .8923401      0.683   0.496      -1.162029    2.381009
x2 |  -.4360071   .7018847     -0.621   0.536      -1.829424    .9574101
x3 |    1.37073   .6650347      2.061   0.042       .0504691    2.690991
x4 |   2.101814   .5036762      4.173   0.000        1.10189    3.101738
_cons |   .1742767   .1066317      1.634   0.105      -.0374139    .3859674
------------------------------------------------------------------------------


or you can fit this model by calculating the values yourself using

 . collapse y x1 x2 x3 x4, by(id) mean(y x1 x2 x3 x4)
. reg y x1 x2 x3 x4

Source |       SS       df       MS                  Number of obs =     100
---------+------------------------------               F(  4,    95) =    7.68
Model |  32.5465229     4  8.13663074               Prob > F      =  0.0000
Residual |  100.619272    95  1.05915023               R-squared     =  0.2444
Total |  133.165795    99  1.34510904               Root MSE      =  1.0292

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 |   .6094896   .8923401      0.683   0.496      -1.162029    2.381009
x2 |  -.4360071   .7018847     -0.621   0.536      -1.829424    .9574101
x3 |    1.37073   .6650347      2.061   0.042       .0504691    2.690991
x4 |   2.101814   .5036762      4.173   0.000        1.10189    3.101738
_cons |   .1742767   .1066317      1.634   0.105      -.0374139    .3859674
------------------------------------------------------------------------------


### Random-effects model

In the random-effects model, we are taking a weighted average of the fixed and between estimates. People generally want to use the random-effects model because they wish to estimate the variables that are constant within unit. The GLS random-effects model xtreg, re and the ML random-effects model xtreg, mle both require that we treat the ui terms as random variables and assume that there is no correlation between ui and X.

The ML random-effects method xtreg, mle also requires that ui follow the normal distribution. This distributional assumption should not be taken lightly because sometimes this assumption may not be valid.

You may be wondering what this means given that the dummy variables are not included in the output of the random-effects models. They are, however, available using predict with the u option. You can compare these values with the estimates that you get from the fixed-effects model where you include all of the dummy variables.

### For the between-effects model, we are talking about the averages of the units over time. Why isn’t the ui term equal to zero, given that it is a mean of the residual?

In the between-effects model, the mean of the ui terms is zero, but the individual terms are not necessarily zero.

### What is the xttest0 command testing?

It is a test of sd(ui) = 0, where sd(ui) is the standard deviation of the ui terms. If this is true, then there is no within-unit correlation.