Title | Fixed-, between-, and random-effects and xtreg | |

Author | James Hardin, StataCorp | |

Date | October 1996; revisions April 2011 |

The **xtreg**
commands are for dealing with longitudinal or panel data. You have data

y_{it}= alpha + XB + u_{i}+ e_{it}

for i units, i=1,...,n measured at times t=1,...,T_{i} (where we use
T_{i} instead of T to allow for unbalanced data).

For expository purposes, let’s say that we have data of the following form:

y Response variable x1 1st predictor x2 2nd predictor x3 3rd predictor x4 4th predictor id Categorical variable denoting unit (has 100 different values)

The simplest model that we could run is to ignore the fact that we have repeated measures on units and simply ask for

. regress y x1 x2 x3 x4Source | SS df MS Number of obs = 500 ---------+------------------------------ F( 4, 495) = 20.05 Model | 152.436199 4 38.1090499 Prob > F = 0.0000 Residual | 941.073278 495 1.90115814 R-squared = 0.1394 ---------+------------------------------ Adj R-squared = 0.1324 Total | 1093.50948 499 2.19140176 Root MSE = 1.3788 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | .6628054 .2155956 3.074 0.002 .23921 1.086401 x2 | .3696673 .1752119 2.110 0.035 .0254167 .713918 x3 | .8149278 .2021136 4.032 0.000 .4178216 1.212034 x4 | .9284717 .1455638 6.378 0.000 .6424726 1.214471 _cons | .2567164 .0622353 4.125 0.000 .1344385 .3789943 ------------------------------------------------------------------------------

However, we would like to use the information that certain sets of
observations came from certain units. In a fixed-effects model, we are
interested in the coefficients B and alpha and we assume that the
u_{i} are fixed quantities. We model

y_{it}= (alpha+u_{i}) + XB + e_{it}

allowing for different intercepts for our units, but constraining the slopes to be the same across units.

In **xtreg** notation, this model is fit specifying

. xtreg y x1 x2 x3 x4, i(id) feFixed-effects (within) regression sd(u_id) = 1.096278 Number of obs = 500 sd(e_id_t) = .9693631 n = 100 sd(e_id_t + u_id) = 1.463383 T = 5 corr(u_id, Xb) = 0.1221 R-sq within = 0.1299 between = 0.1581 overall = 0.1238 F( 4, 396) = 14.79 Prob > F = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | .6964505 .1665487 4.182 0.000 .3690204 1.023881 x2 | .509578 .1358135 3.752 0.000 .2425723 .7765837 x3 | .4204604 .1681315 2.501 0.013 .0899186 .7510023 x4 | .5181118 .1178406 4.397 0.000 .2864405 .7497832 _cons | .2869807 .0439059 6.536 0.000 .2006628 .3732986 ------------------------------------------------------------------------------ id | F(99,396) = 6.116 0.000 (100 categories)

and can also be obtained using
**areg** by specifying

. areg y x1 x2 x3 x4, absorb(id)Source | SS df MS Number of obs = 500 ---------+------------------------------ F(103, 396) = 7.45 Model | 721.4021756 103 7.003904617 Prob > F = 0.0000 Residual | 372.1073022 396 .9396649046 R-squared = 0.6597 ---------+------------------------------ Adj R-squared = 0.5712 Total | 1093.509478 499 2.191401759 Root MSE = .969363 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | .6964505 .1665487 4.182 0.000 .3690204 1.023881 x2 | .509578 .1358135 3.752 0.000 .2425723 .7765837 x3 | .4204604 .1681315 2.501 0.013 .0899186 .7510023 x4 | .5181118 .1178406 4.397 0.000 .2864405 .7497832 _cons | .2869807 .0439059 6.536 0.000 .2006628 .3732986 ------------------------------------------------------------------------------ id | F(99,396) = 6.116 0.000 (100 categories)

In the output of **areg**, the overall *F* test is for the model
including the dummy variables (even though we absorbed them and do not see
the estimated coefficients). In the fixed-effects model of **xtreg**, we
present the overall *F* test only of those terms in which we are
interested (the ones that are listed in the output) instead of showing the
*F* test including the absorbed terms.

Finally, you could also generate the dummy variables yourself and run the
regression using
**regress**:

. quietly tab id, gen(dum) . regress y x1 x2 x3 x4 dum1-dum99Source | SS df MS Number of obs = 500 ---------+------------------------------ F(103, 396) = 7.45 Model | 721.402175 103 7.00390461 Prob > F = 0.0000 Residual | 372.107303 396 .939664907 R-squared = 0.6597 ---------+------------------------------ Adj R-squared = 0.5712 Total | 1093.50948 499 2.19140176 Root MSE = .96936 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | .6964505 .1665487 4.182 0.000 .3690204 1.023881 x2 | .509578 .1358135 3.752 0.000 .2425723 .7765837 x3 | .4204604 .1681315 2.501 0.013 .0899186 .7510023 x4 | .5181118 .1178406 4.397 0.000 .2864405 .7497832 dum1 | -.973649 .6147949 -1.584 0.114 -2.182319 .2350209 dum2 | .9924268 .616741 1.609 0.108 -.2200691 2.204923 ... dum98 | -3.691905 .6175698 -5.978 0.000 -4.90603 -2.47778 dum99 | .7967596 .616487 1.292 0.197 -.4152368 2.008756 _cons | .0237916 .4362536 0.055 0.957 -.833871 .8814541 ------------------------------------------------------------------------------

where the test associated with **id** is the test that all of the dummy
variables are zero.

**Note:** If we were to specify the dummies for 1 through n,
**regress** would drop one of them because of collinearity.

In the between-effects model, we attempt to model the mean response where the means are calculated for each of the units. To use this model, you should have many units.

In **xtreg** notation, you may specify this model as

. xtreg y x1 x2 x3 x4, i(id) beBetween-id regression Number of obs = 500 n = 100 T = 5 R-sq within = 0.0627 between = 0.2444 overall = 0.1145 sd(u_id + e_id) = 1.02915 F( 4, 95) = 7.68 where e_id = avg(e_id_t) Prob > F = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | .6094896 .8923401 0.683 0.496 -1.162029 2.381009 x2 | -.4360071 .7018847 -0.621 0.536 -1.829424 .9574101 x3 | 1.37073 .6650347 2.061 0.042 .0504691 2.690991 x4 | 2.101814 .5036762 4.173 0.000 1.10189 3.101738 _cons | .1742767 .1066317 1.634 0.105 -.0374139 .3859674 ------------------------------------------------------------------------------

or you can fit this model by calculating the values yourself using

. collapse y x1 x2 x3 x4, by(id) mean(y x1 x2 x3 x4) . reg y x1 x2 x3 x4Source | SS df MS Number of obs = 100 ---------+------------------------------ F( 4, 95) = 7.68 Model | 32.5465229 4 8.13663074 Prob > F = 0.0000 Residual | 100.619272 95 1.05915023 R-squared = 0.2444 ---------+------------------------------ Adj R-squared = 0.2126 Total | 133.165795 99 1.34510904 Root MSE = 1.0292 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | .6094896 .8923401 0.683 0.496 -1.162029 2.381009 x2 | -.4360071 .7018847 -0.621 0.536 -1.829424 .9574101 x3 | 1.37073 .6650347 2.061 0.042 .0504691 2.690991 x4 | 2.101814 .5036762 4.173 0.000 1.10189 3.101738 _cons | .1742767 .1066317 1.634 0.105 -.0374139 .3859674 ------------------------------------------------------------------------------

In the random-effects model, we are taking a weighted average of the fixed
and between estimates. People generally want to use the random-effects
model because they wish to estimate the variables that are constant within
unit. The GLS random-effects model **xtreg, re** and the ML random-effects
model **xtreg, mle** both require that we treat the u_{i} terms as
random variables and assume that there is no correlation between u_{i}
and **X**.

The ML random-effects method **xtreg, mle** also requires that
u_{i} follow the normal distribution. This distributional assumption
should not be taken lightly because sometimes this assumption may not be valid.

You may be wondering what this means given that the dummy variables are not
included in the output of the random-effects models. They are, however,
available using **predict** with the **u** option. You can compare
these values with the estimates that you get from the fixed-effects model
where you include all of the dummy variables.

In the between-effects model, the mean of the u_{i} terms is zero,
but the individual terms are not necessarily zero.

It is a test of sd(u_{i}) = 0, where sd(u_{i}) is the
standard deviation of the u_{i} terms. If this is true, then there
is no within-unit correlation.