# Re: st: Re: xtreg v areg

 From "Clive Nicholas" To statalist@hsphsun2.harvard.edu Subject Re: st: Re: xtreg v areg Date Sat, 19 Jun 2004 05:38:42 +0100 (BST)

Kit Baum wrote:

> Without looking carefully at the formulas used in xtreg, fe, I think
> conceptually the issue is that xtreg, fe considers the explanation of
> \sum {{y_{it} - \bar{y}_i)^2} : that is, after demeaning the data by
> the individual means, how much of the remaining variation is explained
> by your regressors? In areg, I suspect that in absorbing the factor
> pano, the amount of variation absorbed is also included in r^2, just as
> it would be if you included the dummies explicitly. That is, a one-way
> ANOVA of your depvar on pano would explain some amount of the
> variation. Do an ANOCOVA including pano and a bunch of regressors, and
> you explain more. But the xtreg, fe model considers that the only thing
> to be explained is y net of individual mean y.

After running a couple of -anova- tests, I see what you mean, Kit: PANO on
its own explains 31.03% of the variation in EDCONCH, nearly half of the
total in the full model (66.78%: both adj R^2s). Of course, section 1 of
ch.14 in Wooldridge (2003) goes into the mechanics in a bit more detail,
inter alia.

In fitting that model, I overlooked something quite basic. I discovered
that I should have fitted time dummies. The result was dramatic:

. areg edconch ed2-ed13 edpollch lagconch laglabch lagldmch clmargin
cdmargin conplace edenp class if edmarker==1 [pw=weight], absorb(pano)
cluster(pano)

Regression with robust standard errors               Number of obs =    1875
F( 18,  1552) =   84.75
Prob > F      =  0.0000
R-squared     =  0.7618
Root MSE      =  6.1587

(standard errors adjusted for clustering on pano)
----------------------------------------------------------------------------
|               Robust
edconch |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
ed2 |  (dropped)
ed3 |  (dropped)
ed4 |   1.706714   3.270239     0.52   0.602    -4.707839    8.121267
ed5 |   4.584752   3.005362     1.53   0.127    -1.310248    10.47975
ed6 |   4.106024   2.795745     1.47   0.142    -1.377812   9.589861
ed7 |   2.216951   2.656637     0.83   0.404    -2.994025    7.427927
ed8 |  -1.576794   2.641104    -0.60   0.551    -6.757303    3.603714
ed9 |   1.673161   2.288051     0.73   0.465    -2.814837    6.161158
ed10 |   2.835353   2.825758     1.00   0.316    -2.707352    8.378059
ed11 |  -6.985576   2.506125    -2.79   0.005    -11.90132   -2.069828
ed12 |  -.8350404   2.366703    -0.35   0.724    -5.477313    3.807233
ed13 |   6.787137   2.954843     2.30   0.022     .9912307    12.58304
edpollch |   .1638447   .0581991     2.82   0.005     .0496876    .2780018
lagconch |  -.1384667   .1818321    -0.76   0.446    -.4951292    .2181958
laglabch |  -.0874243    .178278    -0.49   0.624    -.4371156     .262267
lagldmch |  -.0721165   .1699159    -0.42   0.671    -.4054054    .2611724
clmargin |  -.2637521   .0463728    -5.69   0.000     -.354712   -.1727921
cdmargin |  -.2226632   .0477287    -4.67   0.000    -.3162827   -.1290438
conplace |   1.289732   .9699538     1.33   0.184    -.6128259     3.19229
edenp |  -9.368422    .974361    -9.61   0.000    -11.27962   -7.457219
class |   .0221002   .0536338     0.41   0.680    -.0831022    .1273025
_cons |   28.84836   4.044046     7.13   0.000     20.91599    36.78073
-----------+----------------------------------------------------------------
pano |   absorbed                                     (304 categories)

EDCONCH = change (%) in Con ED vote from the previous general election.
Note that I've taken out the four time/cyclical variables that were in
there orginally (including the time trend) and CDMARGIN has replaced
LDMARGIN (but this has no effect whatsoever on the fit, which improves by
5%). Best of all (from my anorak point of view) I've now got a sensible
value for the constant term! Leaving the time trend variable in produces
the exact same model except that two more time dummies become significant,
the time trend itself is significant and the constant takes on a
ridiculous value (-1004.363, in this case). It would be nice to understand
why/how time dummies produce this effect (if it does at all), but I'm very

However a couple of issues remain. Note that the first two time dummies
have been dropped. I don't understand why, since:

. mrtab ed1- ed13

|                Pct. of     Pct. of
|      Freq.   responses       cases
-------------------------+-----------------------------------
ed1 edyear==  1976.0000 |        303       11.55       11.55
ed2 edyear==  1978.0000 |         55        2.10        2.10
ed3 edyear==  1979.0000 |        297       11.32       11.32
ed4 edyear==  1980.0000 |        125        4.77        4.77
ed5 edyear==  1982.0000 |        128        4.88        4.88
ed6 edyear==  1983.0000 |        305       11.63       11.63
ed7 edyear==  1984.0000 |        167        6.37        6.37
ed8 edyear==  1986.0000 |        168        6.40        6.40
ed9 edyear==  1987.0000 |        305       11.63       11.63
ed10 edyear==  1988.0000 |        155        5.91        5.91
ed11 edyear==  1990.0000 |        157        5.99        5.99
ed12 edyear==  1991.0000 |        305       11.63       11.63
ed13 edyear==  1992.0000 |        153        5.83        5.83
-------------------------+-----------------------------------
Total |       2623      100.00      100.00

(Thank the heavens for Ben Jann, incidentally.) OK, ED2 may be an outlier
as far as Ns are concerned, but ED3 isn't, so why does my LSDV model drop
this too?

The second problem is that I wish to include a lagged term on the depvar.
I know how to do this correctly (gen lag = l.edconch), but there are two
problems here. First, there are gaps in the time series. Now I could use
-tsfill-, but I want the lag to latch on the correct EDYEAR, rather than
incorrectly onto a year for which there no elections. The second problem
is that adding the lagged term knocks out a lot of the time dummies. Is
there a way round this?

Naturally, anyone is invited to take a stab at these posers. Ta.

CLIVE NICHOLAS        |t: 0(044)191 222 5969
Politics              |e: clive.nicholas@ncl.ac.uk
Newcastle University  |http://www.ncl.ac.uk/geps
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/