Title | Interpreting the intercept in the fixed-effects model | |
Author | William Gould, StataCorp |
The results that xtreg, fe reports have simply been reformulated so that the reported intercept is the average value of the fixed effects.
One way of writing the fixed-effects model is
y_{it} = a + x_{it}b + v_{i} + e_{it} (1)
where v_{i} (i=1, ..., n) are simply the fixed effects to be estimated. With no further constraints, the parameters a and v_{i} do not have a unique solution. You can see that by rearranging the terms in (1):
y_{it} = (a + v_{i}) + x_{it}b + e_{it}
Consider some solution which has, say a=3. Then we could just as well say that a=4 and subtract the value 1 from each of the estimated v_{i}.
Thus, before (1) can be estimated, we must place another constraint on the system. Any constraint will do, and the choice we make will have no effect on the estimated b. One popular constraint is a=0, but we could just as well constrain a=3. Changing the value of a would merely change the corresponding values of v_{i}. Nor do we have to constrain a; we could place a constraint on v_{i}. We could, for instance, constrain v_{1}=0 or v_{5}=3.
The constraint that xtreg, fe places on the system is computationally more difficult:
N |
T_{i} |
v_{i} = 0 (c1) |
This constraint means that the panel fixed effects sum to 0 across all observations in the sample. If the panels are unbalanced the v_{i} are effectively weighted by the number of observations in the panel.
Because the constraint we choose is arbitrary, we chose a constraint that makes interpreting the results more convenient. The random-effects estimator proceeds under the *ASSUMPTION* that E(v)=0 and hence can estimate an intercept. We parameterize the fixed-effects estimator so that it proceeds under the *CONSTRAINT* (c1). This constraint has no implication since we had to choose some constraint anyway.
The primary advantage of this constraint is that if you fit some model and then obtain the predictions
then the average value of yhat will equal the average value of y. To obtain estimates with the fixed-effects estimator, we had to impose an arbitrary constraint and had we instead constrained a=0, predict yhat would have produced yhat with average value 0. That would be the only difference; the predictions would differ by a constant (namely, by their respective values of a).
Using the constraint (c1) has another advantage. Let us draw a distinction between models and estimators. The *MODEL* is
y_{it} = a + x_{it}b + v_{i} + e_{it} (1)
Under the random-effects *MODEL*, it is assumed that E(v)=0 and that v_{i} and x_{it} are uncorrelated. From that model, we can derive the random-effects *ESTIMATOR*.
Under the fixed-effects *MODEL*, no assumptions are made about v_{i} except that they are fixed parameters. From that model, we can derive the fixed-effects *ESTIMATOR*.
It turns out that the fixed-effects *ESTIMATOR* is an admissible estimator for the random-effects *MODEL*; it is merely less efficient than the random-effects *ESTIMATOR*. That is,
| ----------------- model --------------------- Estimator | fixed effects random effects ------------------------+--------------------------------------------------- fixed effects | appropriate appropriate random effects | inappropriate appropriate ------------------------+---------------------------------------------------
When you use the fixed-effects *ESTIMATOR* for the random-effects *MODEL*, the intercept a reported by xtreg, fe is the appropriate estimate for the intercept of the random-effects model.
The fixed-effects model is
y_{it} = a + x_{it} b + v_{i} + e_{it} (1)
From which it follows that
_ _ _ y_{i} = a + x_{i} b + v_{i} + e_{i} (2)
where
_ _ _ y_{i } x_{i} e_{i}
are with averages of
y_{it} x_{it} e_{it}
within i.
Subtracting (2) from (1), we obtain
_ _ _ y_{it } - y_{i } = (x_{it } - x_{i })b + (e_{it } - e_{i}) (3)
Equation (3) is the way many people think about the fixed-effects estimator. a remains unestimated in this formula. From (1), it also follows that
= = _ = y = a + xb + v + e (4)
where
= = _ = y x v e
are the grand averages of
y_{it} x_{it} v_{i} e_{it}For instance,
=
yn
Σ
i=1
Σ
T_{i}
t=1
y_{it}
=
total_number_of_observations
Summing (3) and (4), we obtain
_ = _ = _ _ = y_{it } - y_{i} + y = a + (x_{it } - x_{i } + x)b + (e_{it} - e_{i} + v) + e (5)
xtreg, fe estimates the above equation under the constraint
_ v = 0
which is to say, it estimates
_ = _ = y_{it } - y_{i} + y = a + (x_{it } - x_{i} + x)b + noise
Thus the left-side variable is y_{it} minus the within-group means but with the grand mean added back in, and the right-side variables are x_{it} minus the within-group means but with the grand mean added back in. Obviously, adding in grand means to the left and right sides has no affect on the estimated b.
Fixed-effects regression is supposed to produce the same coefficient estimates and standard errors as ordinary regression when indicator (dummy) variables are included for each of the groups. Because the fixed-effects model is
y_{ij} = X_{ij}b + v_{i} + e_{it}
and v_{i} are fixed parameters to be estimated, this is the same as
y_{ij} = X_{ij}b + v_{1}d1_{i} + v_{2}d2_{i} + ... e_{it}
where d1 is 1 when i=1 and 0 otherwise, d2 is 1 when i=2 and 0 otherwise, and so on. d1, d2, ..., are just dummy variables indicating the groups, and v_{1}, v_{2}, ..., are their regression coefficients, which we must estimate.
The problem is that we typically have lots of groups—perhaps thousands—and including lots of dummy variables is too computationally expensive, so we look for a shortcut.
Nevertheless, we could take a little dataset with just a few groups and compare the methods. Here is my little dataset:
group x y | ||||
1. | 1 0 -5 | |||
2. | 1 8 23 | |||
3. | 1 17 44 | |||
4. | 2 10 29 | |||
5. | 2 16 26 | |||
6. | 3 4 17 | |||
7. | 3 11 17 | |||
8. | 3 5 31 | |||
9. | 4 18 50 | |||
10. | 4 5 26 | |||
11. | 4 2 17 | |||
I am going to show you
How can method 3 be wrong? Because it fails to account for the fact that the means we removed are *ESTIMATES*. As a consequence, it understates standard errors.
Source | SS df MS | Number of obs = 11 | |
F( 4, 6) = 4.01 | |||
Model | 1554.16667 4 388.541667 | Prob > F = 0.0643 | |
Residual | 581.833333 6 96.9722222 | R-squared = 0.7276 | |
Adj R-squared = 0.5460 | |||
Total | 2136 10 213.6 | Root MSE = 9.8474 |
y | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |
x | 2 .5372223 3.72 0.010 .6854644 3.314536 | |
group | ||
2 | -2.5 9.332493 -0.27 0.798 -25.33579 20.33579 | |
3 | 4.333333 8.090107 0.54 0.611 -15.46245 24.12911 | |
4 | 10.33333 8.040407 1.29 0.246 -9.340834 30.0075 | |
_cons | 4 7.236455 0.55 0.600 -13.70697 21.70697 | |
y | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |
x | 2 .5372223 3.72 0.010 .6854644 3.314536 | |
_cons | 7.545455 5.549554 1.36 0.223 -6.033816 21.12472 | |
sigma_u | 5.6213466 | |
sigma_e | 9.8474475 | |
rho | .24577354 (fraction of variance due to u_i) | |
If you compare, you will find that regress with group dummies reported the same coefficient (2) and the same standard error (.5372223) for x as xtreg, fe just did. In both cases, the t statistic is 3.72.
Source | SS df MS | Number of obs = 11 | |
F( 1, 10) = 23.10 | |||
Model | 1343.99999 1 1343.99999 | Prob > F = 0.0007 | |
Residual | 581.833327 10 58.1833327 | R-squared = 0.6979 | |
Adj R-squared = 0.6677 | |||
Total | 1925.83332 11 175.075756 | Root MSE = 7.6278 |
yd | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |
xd | 2 .4161306 4.81 0.001 1.072803 2.927197 | |
So, to summarize:
x | Coefficient Std. Err. t -------------------------+------------------------------------ regress with dummies | 2 .5372223 3.72 xtreg, fe | 2 .5372223 3.72 removing the means | 2 .4161306 4.81 -------------------------+------------------------------------
regress with dummies definitionally calculates correct results.
xtreg, fe matches them.
Removing the means and estimating on the deviations with the noconstant option produces correct coefficients but incorrect standard errors. Why? Because we did not account for the fact that the means we removed from y and x were estimated.