Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: xtreg v areg


From   "Clive Nicholas" <Clive.Nicholas@newcastle.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: xtreg v areg
Date   Sat, 19 Jun 2004 05:38:42 +0100 (BST)

Kit Baum wrote:

> Without looking carefully at the formulas used in xtreg, fe, I think
> conceptually the issue is that xtreg, fe considers the explanation of
> \sum {{y_{it} - \bar{y}_i)^2} : that is, after demeaning the data by
> the individual means, how much of the remaining variation is explained
> by your regressors? In areg, I suspect that in absorbing the factor
> pano, the amount of variation absorbed is also included in r^2, just as
> it would be if you included the dummies explicitly. That is, a one-way
> ANOVA of your depvar on pano would explain some amount of the
> variation. Do an ANOCOVA including pano and a bunch of regressors, and
> you explain more. But the xtreg, fe model considers that the only thing
> to be explained is y net of individual mean y.

After running a couple of -anova- tests, I see what you mean, Kit: PANO on
its own explains 31.03% of the variation in EDCONCH, nearly half of the
total in the full model (66.78%: both adj R^2s). Of course, section 1 of
ch.14 in Wooldridge (2003) goes into the mechanics in a bit more detail,
inter alia.

In fitting that model, I overlooked something quite basic. I discovered
that I should have fitted time dummies. The result was dramatic:

. areg edconch ed2-ed13 edpollch lagconch laglabch lagldmch clmargin
cdmargin conplace edenp class if edmarker==1 [pw=weight], absorb(pano)
cluster(pano)

Regression with robust standard errors               Number of obs =    1875
                                                     F( 18,  1552) =   84.75
                                                     Prob > F      =  0.0000
                                                     R-squared     =  0.7618
                                                     Adj R-squared =  0.7124
                                                     Root MSE      =  6.1587

                           (standard errors adjusted for clustering on pano)
----------------------------------------------------------------------------
           |               Robust
   edconch |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
       ed2 |  (dropped)
       ed3 |  (dropped)
       ed4 |   1.706714   3.270239     0.52   0.602    -4.707839    8.121267
       ed5 |   4.584752   3.005362     1.53   0.127    -1.310248    10.47975
       ed6 |   4.106024   2.795745     1.47   0.142    -1.377812   9.589861
       ed7 |   2.216951   2.656637     0.83   0.404    -2.994025    7.427927
       ed8 |  -1.576794   2.641104    -0.60   0.551    -6.757303    3.603714
       ed9 |   1.673161   2.288051     0.73   0.465    -2.814837    6.161158
      ed10 |   2.835353   2.825758     1.00   0.316    -2.707352    8.378059
      ed11 |  -6.985576   2.506125    -2.79   0.005    -11.90132   -2.069828
      ed12 |  -.8350404   2.366703    -0.35   0.724    -5.477313    3.807233
      ed13 |   6.787137   2.954843     2.30   0.022     .9912307    12.58304
  edpollch |   .1638447   .0581991     2.82   0.005     .0496876    .2780018
  lagconch |  -.1384667   .1818321    -0.76   0.446    -.4951292    .2181958
  laglabch |  -.0874243    .178278    -0.49   0.624    -.4371156     .262267
  lagldmch |  -.0721165   .1699159    -0.42   0.671    -.4054054    .2611724
  clmargin |  -.2637521   .0463728    -5.69   0.000     -.354712   -.1727921
  cdmargin |  -.2226632   .0477287    -4.67   0.000    -.3162827   -.1290438
  conplace |   1.289732   .9699538     1.33   0.184    -.6128259     3.19229
     edenp |  -9.368422    .974361    -9.61   0.000    -11.27962   -7.457219
     class |   .0221002   .0536338     0.41   0.680    -.0831022    .1273025
     _cons |   28.84836   4.044046     7.13   0.000     20.91599    36.78073
-----------+----------------------------------------------------------------
      pano |   absorbed                                     (304 categories)

EDCONCH = change (%) in Con ED vote from the previous general election.
Note that I've taken out the four time/cyclical variables that were in
there orginally (including the time trend) and CDMARGIN has replaced
LDMARGIN (but this has no effect whatsoever on the fit, which improves by
5%). Best of all (from my anorak point of view) I've now got a sensible
value for the constant term! Leaving the time trend variable in produces
the exact same model except that two more time dummies become significant,
the time trend itself is significant and the constant takes on a
ridiculous value (-1004.363, in this case). It would be nice to understand
why/how time dummies produce this effect (if it does at all), but I'm very
pleased with this.

However a couple of issues remain. Note that the first two time dummies
have been dropped. I don't understand why, since:

. mrtab ed1- ed13

                         |                Pct. of     Pct. of
                         |      Freq.   responses       cases
-------------------------+-----------------------------------
 ed1 edyear==  1976.0000 |        303       11.55       11.55
 ed2 edyear==  1978.0000 |         55        2.10        2.10
 ed3 edyear==  1979.0000 |        297       11.32       11.32
 ed4 edyear==  1980.0000 |        125        4.77        4.77
 ed5 edyear==  1982.0000 |        128        4.88        4.88
 ed6 edyear==  1983.0000 |        305       11.63       11.63
 ed7 edyear==  1984.0000 |        167        6.37        6.37
 ed8 edyear==  1986.0000 |        168        6.40        6.40
 ed9 edyear==  1987.0000 |        305       11.63       11.63
ed10 edyear==  1988.0000 |        155        5.91        5.91
ed11 edyear==  1990.0000 |        157        5.99        5.99
ed12 edyear==  1991.0000 |        305       11.63       11.63
ed13 edyear==  1992.0000 |        153        5.83        5.83
-------------------------+-----------------------------------
                   Total |       2623      100.00      100.00

(Thank the heavens for Ben Jann, incidentally.) OK, ED2 may be an outlier
as far as Ns are concerned, but ED3 isn't, so why does my LSDV model drop
this too?

The second problem is that I wish to include a lagged term on the depvar.
I know how to do this correctly (gen lag = l.edconch), but there are two
problems here. First, there are gaps in the time series. Now I could use
-tsfill-, but I want the lag to latch on the correct EDYEAR, rather than
incorrectly onto a year for which there no elections. The second problem
is that adding the lagged term knocks out a lot of the time dummies. Is
there a way round this?

Naturally, anyone is invited to take a stab at these posers. Ta.

CLIVE NICHOLAS        |t: 0(044)191 222 5969
Politics              |e: clive.nicholas@ncl.ac.uk
Newcastle University  |http://www.ncl.ac.uk/geps
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index