How can there be an intercept in the fixed-effects model estimated by
xtreg, fe?
|
Title
|
|
Interpreting the intercept in the fixed-effects model
|
|
Author
|
William Gould, StataCorp
|
|
Date
|
October 1997; updated July 2011
|
The results that
xtreg, fe reports
have simply been reformulated so that the reported intercept is the average
value of the fixed effects.
Intuition
One way of writing the fixed-effects model is
yit = a + xitb + vi + eit (1)
where vi (i=1, ..., n) are simply the fixed effects to be
estimated. With no further constraints, the parameters a and vi
do not have a unique solution. You can see that by rearranging the terms in
(1):
yit = (a + vi) + xitb + eit
Consider some solution which has, say a=3. Then we could just as well say
that a=4 and subtract the value 1 from each of the estimated vi.
Thus, before (1) can be estimated, we must place another constraint on the
system. Any constraint will do, and the choice we make will have no effect
on the estimated b. One popular constraint is a=0, but we could just as
well constrain a=3. Changing the value of a would merely change the
corresponding values of vi. Nor do we have to constrain a; we
could place a constraint on vi. We could, for instance,
constrain v1=0 or v5=3.
The constraint that xtreg, fe places on the system is
computationally more difficult:
Because the constraint we choose is arbitrary, we chose a constraint that
makes interpreting the results more convenient. The random-effects
estimator proceeds under the *ASSUMPTION* that E(vi)=0 and
hence can estimate an intercept. We parameterize the fixed-effects
estimator so that it proceeds under the *CONSTRAINT*
average(vi)=0. This constraint has no implication since we had
to choose some constraint anyway.
The primary advantage of this constraint is that if you fit some model and
then obtain the predictions
. xtreg y x1 x2 x3, fe
. predict yhat
then the average value of yhat will equal the average value of y. To obtain
estimates with the fixed-effects estimator, we had to impose an arbitrary
constraint and had we instead constrained a=0, predict yhat would
have produced yhat with average value 0. That would be the only difference;
the predictions would differ by a constant (namely, by their respective
values of a).
Using the constraint Sum vi=0 has another advantage. Let us draw
a distinction between models and estimators. The *MODEL* is
yit = a + xitb + vi + eit (1)
Under the random-effects *MODEL*, it is assumed that
E(vi)=0 and that vi and xit are
uncorrelated. From that model, we can derive the random-effects
*ESTIMATOR*.
Under the fixed-effects *MODEL*, no assumptions are made about vi
except that they are fixed parameters. From that model, we can derive the
fixed-effects *ESTIMATOR*.
It turns out that the fixed-effects *ESTIMATOR* is an admissible estimator
for the random-effects *MODEL*; it is merely less efficient than the
random-effects *ESTIMATOR*. That is,
| ----------------- model ---------------------
Estimator | fixed effects random effects
------------------------+---------------------------------------------------
fixed effects | appropriate appropriate
random effects | inappropriate appropriate
------------------------+---------------------------------------------------
When you use the fixed-effects *ESTIMATOR* for the random-effects *MODEL*,
the intercept a reported by xtreg, fe is the appropriate
estimate for the intercept of the random-effects model.
Derivation
The fixed-effects model is
yit = a + xit b + vi + eit (1)
From which it follows that
_ _ _
yi = a + xi b + vi + ei (2)
where
_ _ _
yi xi ei
are with averages of
yit xit eit
within i.
Subtracting (2) from (1), we obtain
_ _ _
yit - yi = (xit - xi )b + (eit - ei) (3)
Equation (3) is the way many people think about the fixed-effects estimator.
a remains unestimated in this formula. From (1), it also follows that
= = _ =
y = a + xb + v + e (4)
where
= = _ =
y x v e
are the grand averages of
yit xit vi eit
For instance,
=
y
|
|
n
Σ
i=1
|
Σ
t=1
|
yit
|
=
|
|
| |
total_number_of_observations
|
Summing (3) and (4), we obtain
_ = _ = _ _ =
yit - yi + y = a + (xit - xi + x)b + (eit - ei + v) + e (5)
xtreg, fe estimates the above equation under the constraint
_
v = 0
which is to say, it estimates
_ = _ =
yit - yi + y = a + (xit - xi + x)b + noise
Thus the left-side variable is yit minus the within-group
means but with the grand mean added back in, and the right-side
variables are xit minus the within-group means but with the grand
mean added back in. Obviously, adding in grand means to the left and
right sides has no affect on the estimated b.
Demonstration
Fixed-effects regression is supposed to produce the same coefficient
estimates and standard errors as ordinary regression when indicator (dummy)
variables are included for each of the groups. Because the fixed-effects
model is
yij = Xijb + vi + eit
and vi are fixed parameters to be estimated, this is the same as
yij = Xijb + v1d1i + v2d2i + ... eit
where d1 is 1 when i=1 and 0 otherwise, d2 is 1 when i=2 and 0 otherwise,
and so on. d1, d2, ..., are just dummy variables indicating the groups, and
v1, v2, ..., are their regression coefficients, which
we must estimate.
The problem is that we typically have lots of groups—perhaps
thousands—and including logs of dummy variables is too computationally
expensive, so we look for a shortcut.
Nevertheless, we could take a little dataset with just a few groups and
compare the methods. Here is my little dataset:
. list
+-----------------+
| group x y |
|-----------------|
1. | 1 0 -5 |
2. | 1 8 23 |
3. | 1 17 44 |
4. | 2 10 29 |
5. | 2 16 26 |
|-----------------|
6. | 3 4 17 |
7. | 3 11 17 |
8. | 3 5 31 |
9. | 4 18 50 |
10. | 4 5 26 |
|-----------------|
11. | 4 2 17 |
+-----------------+
I am going to show you
-
what regress
with group dummies reports;
-
that xtreg, fe
reports the same results;
-
that removing the within-group means and estimating a regression on
the deviations
without an intercept (as given in equation 3) produces the same
coefficients but different standard errors.
How can method 3 be wrong? Because it fails to account for the fact that
the means we removed are *ESTIMATES*. As a consequence, it understates
standard errors.
1. What regress with group dummies reports
. regress y x i.group
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 4, 6) = 4.01
Model | 1554.16667 4 388.541667 Prob > F = 0.0643
Residual | 581.833333 6 96.9722222 R-squared = 0.7276
-------------+------------------------------ Adj R-squared = 0.5460
Total | 2136 10 213.6 Root MSE = 9.8474
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 2 .5372223 3.72 0.010 .6854644 3.314536
|
group |
2 | -2.5 9.332493 -0.27 0.798 -25.33579 20.33579
3 | 4.333333 8.090107 0.54 0.611 -15.46245 24.12911
4 | 10.33333 8.040407 1.29 0.246 -9.340834 30.0075
|
_cons | 4 7.236455 0.55 0.600 -13.70697 21.70697
------------------------------------------------------------------------------
2. xtreg, fe reports the same results
. xtset group
panel variable: group (unbalanced)
. xtreg y x, fe
Fixed-effects (within) regression Number of obs = 11
Group variable: group Number of groups = 4
R-sq: within = 0.6979 Obs per group: min = 2
between = 0.1716 avg = 2.8
overall = 0.6146 max = 3
F(1,6) = 13.86
corr(u_i, Xb) = -0.1939 Prob > F = 0.0098
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 2 .5372223 3.72 0.010 .6854644 3.314536
_cons | 7.545455 5.549554 1.36 0.223 -6.033816 21.12472
-------------+----------------------------------------------------------------
sigma_u | 5.6213466
sigma_e | 9.8474475
rho | .24577354 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3, 6) = 0.83 Prob > F = 0.5241
If you compare, you will find that regress with group dummies
reported the same coefficient (2) and the same standard error (.5372223) for
x as xtreg, fe just did. In both cases, the t statistic is
3.72.
3. Fitting the deviation model reports incorrect standard errors
. egen double ybar = mean(y), by(group)
. egen double xbar = mean(x), by(group)
. gen yd = y-ybar
. gen xd = x-xbar
. regress yd xd, noconstant
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 10) = 23.10
Model | 1343.99999 1 1343.99999 Prob > F = 0.0007
Residual | 581.833327 10 58.1833327 R-squared = 0.6979
-------------+------------------------------ Adj R-squared = 0.6677
Total | 1925.83332 11 175.075756 Root MSE = 7.6278
------------------------------------------------------------------------------
yd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xd | 2 .4161306 4.81 0.001 1.072803 2.927197
------------------------------------------------------------------------------
So, to summarize:
x
| Coefficient Std. Err. t
-------------------------+------------------------------------
regress with dummies | 2 .5372223 3.72
xtreg, fe | 2 .5372223 3.72
removing the means | 2 .4161306 4.81
-------------------------+------------------------------------
regress with dummies definitionally calculates correct results.
xtreg, fe matches them.
Removing the means and estimating on the deviations with the
noconstant option produces correct coefficients but incorrect
standard errors. Why? Because we did not account for the fact that the
means we removed from y and x were estimated.
|