Is there more information on xtreg?
|
Title
|
|
Fixed-, between-, and random-effects and xtreg
|
|
Author
|
James Hardin, StataCorp
|
|
Date
|
October 1996; revisions April 2011
|
The xtreg
commands are for dealing with longitudinal or panel data. You have data
yit = alpha + XB + ui + eit
for i units, i=1,...,n measured at times t=1,...,Ti (where we use
Ti instead of T to allow for unbalanced data).
For expository purposes, let’s say that we have data of the following
form:
y Response variable
x1 1st predictor
x2 2nd predictor
x3 3rd predictor
x4 4th predictor
id Categorical variable denoting unit (has 100 different values)
The simplest model that we could run is to ignore the fact that we have
repeated measures on units and simply ask for
. regress y x1 x2 x3 x4
Source | SS df MS Number of obs = 500
---------+------------------------------ F( 4, 495) = 20.05
Model | 152.436199 4 38.1090499 Prob > F = 0.0000
Residual | 941.073278 495 1.90115814 R-squared = 0.1394
---------+------------------------------ Adj R-squared = 0.1324
Total | 1093.50948 499 2.19140176 Root MSE = 1.3788
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | .6628054 .2155956 3.074 0.002 .23921 1.086401
x2 | .3696673 .1752119 2.110 0.035 .0254167 .713918
x3 | .8149278 .2021136 4.032 0.000 .4178216 1.212034
x4 | .9284717 .1455638 6.378 0.000 .6424726 1.214471
_cons | .2567164 .0622353 4.125 0.000 .1344385 .3789943
------------------------------------------------------------------------------
Fixed-effects model
However, we would like to use the information that certain sets of
observations came from certain units. In a fixed-effects model, we are
interested in the coefficients B and alpha and we assume that the
ui are fixed quantities. We model
yit = (alpha+ui) + XB + eit
allowing for different intercepts for our units, but constraining the slopes
to be the same across units.
In xtreg notation, this model is fit specifying
. xtreg y x1 x2 x3 x4, i(id) fe
Fixed-effects (within) regression
sd(u_id) = 1.096278 Number of obs = 500
sd(e_id_t) = .9693631 n = 100
sd(e_id_t + u_id) = 1.463383 T = 5
corr(u_id, Xb) = 0.1221 R-sq within = 0.1299
between = 0.1581
overall = 0.1238
F( 4, 396) = 14.79
Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | .6964505 .1665487 4.182 0.000 .3690204 1.023881
x2 | .509578 .1358135 3.752 0.000 .2425723 .7765837
x3 | .4204604 .1681315 2.501 0.013 .0899186 .7510023
x4 | .5181118 .1178406 4.397 0.000 .2864405 .7497832
_cons | .2869807 .0439059 6.536 0.000 .2006628 .3732986
------------------------------------------------------------------------------
id | F(99,396) = 6.116 0.000 (100 categories)
and can also be obtained using
areg by specifying
. areg y x1 x2 x3 x4, absorb(id)
Source | SS df MS Number of obs = 500
---------+------------------------------ F(103, 396) = 7.45
Model | 721.4021756 103 7.003904617 Prob > F = 0.0000
Residual | 372.1073022 396 .9396649046 R-squared = 0.6597
---------+------------------------------ Adj R-squared = 0.5712
Total | 1093.509478 499 2.191401759 Root MSE = .969363
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | .6964505 .1665487 4.182 0.000 .3690204 1.023881
x2 | .509578 .1358135 3.752 0.000 .2425723 .7765837
x3 | .4204604 .1681315 2.501 0.013 .0899186 .7510023
x4 | .5181118 .1178406 4.397 0.000 .2864405 .7497832
_cons | .2869807 .0439059 6.536 0.000 .2006628 .3732986
------------------------------------------------------------------------------
id | F(99,396) = 6.116 0.000 (100 categories)
In the output of areg, the overall F test is for the model
including the dummy variables (even though we absorbed them and do not see
the estimated coefficients). In the fixed-effects model of xtreg, we
present the overall F test only of those terms in which we are
interested (the ones that are listed in the output) instead of showing the
F test including the absorbed terms.
Finally, you could also generate the dummy variables yourself and run the
regression using
regress:
. quietly tab id, gen(dum)
. regress y x1 x2 x3 x4 dum1-dum99
Source | SS df MS Number of obs = 500
---------+------------------------------ F(103, 396) = 7.45
Model | 721.402175 103 7.00390461 Prob > F = 0.0000
Residual | 372.107303 396 .939664907 R-squared = 0.6597
---------+------------------------------ Adj R-squared = 0.5712
Total | 1093.50948 499 2.19140176 Root MSE = .96936
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | .6964505 .1665487 4.182 0.000 .3690204 1.023881
x2 | .509578 .1358135 3.752 0.000 .2425723 .7765837
x3 | .4204604 .1681315 2.501 0.013 .0899186 .7510023
x4 | .5181118 .1178406 4.397 0.000 .2864405 .7497832
dum1 | -.973649 .6147949 -1.584 0.114 -2.182319 .2350209
dum2 | .9924268 .616741 1.609 0.108 -.2200691 2.204923
...
dum98 | -3.691905 .6175698 -5.978 0.000 -4.90603 -2.47778
dum99 | .7967596 .616487 1.292 0.197 -.4152368 2.008756
_cons | .0237916 .4362536 0.055 0.957 -.833871 .8814541
------------------------------------------------------------------------------
where the test associated with id is the test that all of the dummy
variables are zero.
Note: If we were to specify the dummies for 1 through n,
regress would drop one of them because of collinearity.
Between-effects model
In the between-effects model, we attempt to model the mean response where
the means are calculated for each of the units. To use this model, you
should have many units.
In xtreg notation, you may specify this model as
. xtreg y x1 x2 x3 x4, i(id) be
Between-id regression
Number of obs = 500
n = 100
T = 5
R-sq within = 0.0627
between = 0.2444
overall = 0.1145
sd(u_id + e_id) = 1.02915 F( 4, 95) = 7.68
where e_id = avg(e_id_t) Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | .6094896 .8923401 0.683 0.496 -1.162029 2.381009
x2 | -.4360071 .7018847 -0.621 0.536 -1.829424 .9574101
x3 | 1.37073 .6650347 2.061 0.042 .0504691 2.690991
x4 | 2.101814 .5036762 4.173 0.000 1.10189 3.101738
_cons | .1742767 .1066317 1.634 0.105 -.0374139 .3859674
------------------------------------------------------------------------------
or you can fit this model by calculating the values yourself using
. collapse y x1 x2 x3 x4, by(id) mean(y x1 x2 x3 x4)
. reg y x1 x2 x3 x4
Source | SS df MS Number of obs = 100
---------+------------------------------ F( 4, 95) = 7.68
Model | 32.5465229 4 8.13663074 Prob > F = 0.0000
Residual | 100.619272 95 1.05915023 R-squared = 0.2444
---------+------------------------------ Adj R-squared = 0.2126
Total | 133.165795 99 1.34510904 Root MSE = 1.0292
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | .6094896 .8923401 0.683 0.496 -1.162029 2.381009
x2 | -.4360071 .7018847 -0.621 0.536 -1.829424 .9574101
x3 | 1.37073 .6650347 2.061 0.042 .0504691 2.690991
x4 | 2.101814 .5036762 4.173 0.000 1.10189 3.101738
_cons | .1742767 .1066317 1.634 0.105 -.0374139 .3859674
------------------------------------------------------------------------------
Random-effects model
In the random-effects model, we are taking a weighted average of the fixed
and between estimates. People generally want to use the random-effects
model because they wish to estimate the variables that are constant within
unit. The GLS random-effects model xtreg, re and the ML random-effects
model xtreg, mle both require that we treat the ui terms as
random variables and assume that there is no correlation between ui
and X.
The ML random-effects method xtreg, mle also requires that
ui follow the normal distribution. This distributional assumption
should not be taken lightly because sometimes this assumption may not be valid.
You may be wondering what this means given that the dummy variables are not
included in the output of the random-effects models. They are, however,
available using predict with the u option. You can compare
these values with the estimates that you get from the fixed-effects model
where you include all of the dummy variables.
For the between-effects model, we are talking about the averages of the units
over time. Why isn’t the ui term equal to zero, given that it is a mean
of the residual?
In the between-effects model, the mean of the ui terms is zero,
but the individual terms are not necessarily zero.
What is the xttest0 command testing?
It is a test of sd(ui) = 0, where sd(ui) is the
standard deviation of the ui terms. If this is true, then there
is no within-unit correlation.
|