Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Cross-Sectional Time Series


From   ddrukker@stata.com (David M. Drukker, Stata Corp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Cross-Sectional Time Series
Date   Wed, 26 Jun 2002 10:31:05 -0500

John Neumann <neumannj@bu.edu> began an interesting thread on this list
yesterday when he asked whether he should use -reg ,cluster(id)-, -xtreg ,
re- or -xtreg, fe- for estimation and inference about his panel data model.

Then anirban basu <abasu@midway.uchicago.edu> and Mark Schaffer
<M.E.Schaffer@hw.ac.uk> both provided interesting responses to John's
original question.

Both Anirban and Mark have pointed out that -regress , cluster(id)- will
provide consistent estimates of the coefficients.  There is also agreement
that -xtreg, re- will provide consistent estimates of the coefficients.  But
there seems to some discussion of whether -xtreg, fe- will provide consistent
estimates.

In theory, all three estimators (-regress ,cluster(id)-, -xtreg,re- ,
-xtreg, fe-) are consistent estimators of the coefficients for
random-effects data generating processes.  To flush the details, consider
the random-effects data generating process

	y_it = X_it b + u_i + e_it

where X_it is a 1 x K vector of covariates, b is K x 1 vector of coefficients,
u_i is identically, independently distributed (iid) over id's, 
e_it is iid over the observations, and there is no correlation between 
X_it and u_i.  Under these assumptions, all three estimators all provide
consistent estimators of the VCE matrix and the resulting Wald tests will
obtain nominal coverage, given enough data.  

Another theoretical point is in order.  For the random-effects data
generating process -xtreg, re- should provide more efficient estimates of
the coefficients than either of the other two.  While -xtreg, fe- should
produce more efficient estimates than -regress, cluster(id)-.  (One caveat in
this case is that inference is said to be conditional on the random-effects in
the sample.)

To illustrate these points, I have written a small simulation.  The do file
is appended to the below my signature.  Breifly the program

        i)   produces 1000 draws from a parameterization of the 
	     random-effects data generating process
	ii)  runs the three estimators on each sample, saving off the
             coefficients
	iii) uses -test- to test that coefficients are equal to their true
      	     values, saving off p-values 
	iv)  then computes the coverage rates obtained by each estimator on
	     on each test

Let's begin looking at the results for the coefficients.

First, we need to understand the variable names.  As can be seen from the
program fevclust.do, appended below,

      Variable name     Meaning
      x1_crg            -coefficient on x1 from -regress, cluster(id)-
      x2_crg            -coefficient on x2 from -regress, cluster(id)-
      x1_cfe            -coefficient on x1 from -xtreg, fe-
      x2_cfe            -coefficient on x2 from -xtreg, fe- 
      x1_cre            -coefficient on x1 from -xtreg, fe- 
      x2_cre            -coefficient on x2 from -xtreg, fe- 

Now for these results.  The table below presents the summary statistics from
these variables obtained over the 1000 samples that were generated.

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
      x1_crg |    1000     2.99791   .1070938   2.540678   3.372749
      x2_crg |    1000    3.001715   .1044514    2.71244   3.303503
      x1_cfe |    1000    3.000676    .051932   2.847759   3.144626
      x2_cfe |    1000    3.003031   .0496754   2.845199   3.160065
      x1_cre |    1000    3.000508   .0515863   2.846256   3.150228
      x2_cre |    1000     3.00292   .0498531   2.839417   3.163176

There are several points to note.  The mean of the estimates of each
estimator is very close to the true value of 3.0.  Second the standard
deviation of the estimates from -regress, cluster(id)- is about twice the
standard deviations from the other two estimators.  This indicates that
-xtreg,re- and -xtreg,fe- are more efficient that -regress, cluster(id)-.
Third, the standard deviation of the estimates from -xtreg, fe- are
surprisingly close to those of -xtreg, re-.  This indicates that for this
parameterization of the data generating process and sample size, -xtreg, fe-
is as efficient an estimator as -xtreg, re-.

Now let's consider coverage.  The results table below contains the means of
6 binary variables from the 1000 generated samples.  In each sample, each
variable is 1 if the test in question was rejected for that sample and zero
otherwise.  Thus the means in the table below can be interpreted as
emprical coverage rates.

x1_rjrg   fraction of tests in which the true null that x1=3 was rejected 
          after -reg ,cluster(id) 
x2_rjrg   fraction of tests in which the true null that x2=3 was rejected 
          after -reg ,cluster(id) 
x1_rjfe   fraction of tests in which the true null that x1=3 was rejected 
          after -xtreg, fe- 
x2_rjfe   fraction of tests in which the true null that x2=3 was rejected 
          after  -xtreg, re- 
x1_rjre   fraction of tests in which the true null that x1=3 was rejected 
          after -xtreg, re- 
x2_rjre   fraction of tests in which the true null that x2=3 was rejected 
          after  -xtreg, fe- 

And the results are

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
     x1_rjrg |    1000         .05    .218054          0          1
     x2_rjrg |    1000         .07   .2552747          0          1
     x1_rjfe |    1000        .054   .2261308          0          1
     x2_rjfe |    1000        .039   .1936918          0          1
     x1_rjre |    1000        .059   .2357426          0          1
     x2_rjre |    1000        .042   .2006895          0          1

Note the all the tests are reasonable close to nominal coverage.  Also note
that the tests after -xtreg, re- are marginally closer to nominal than those
after -xtreg, fe-.  


There is one final point that must be made.  The crutial assumption in the
above data generating process is that X_it and u_i are not correlated.  If
they are correlated only -xtreg, fe- will provide consistent estimates.
Below -fevclust.do-, I have append a second program, called fe_ex.do, that
illustrates this point.  -fe_ex.do- generates a single large sample from the
same structure as above, EXCEPT that there is correlation between X_it and
u_i.

Here are the crutial correlations in our sample

. corr x1 x2 ui
(obs=5000)

             |       x1       x2       ui
-------------+---------------------------
          x1 |   1.0000
          x2 |   0.6081   1.0000
          ui |   0.6349   0.7170   1.0000


Since all true values of the coefficients are 3.0, the output below
illustrates, -regress, cluster(id)- is not consistent for this data
generating process.

. regress y x1 x2,cluster(id)

Regression with robust standard errors                 Number of obs =    5000
                                                       F(  2,   999) =34624.91
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.9674
Number of clusters (id) = 1000                         Root MSE      =  1.6459

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   3.498473    .024788   141.14   0.000     3.449831    3.547116
          x2 |   3.701595   .0232616   159.13   0.000     3.655948    3.747242
       _cons |    2.97323   .0336344    88.40   0.000     2.907228    3.039232
------------------------------------------------------------------------------



In contrast, the output below illustrates -xtreg, fe- is a consistent
estimator for the coefficients with this data generating process.

. xtreg y x1 x2, fe i(id)

Fixed-effects (within) regression               Number of obs      =      5000
Group variable (i) : id                         Number of groups   =      1000

R-sq:  within  = 0.9602                         Obs per group: min =         5
       between = 0.9885                                        avg =       5.0
       overall = 0.9672                                        max =         5

                                                F(2,3998)          =  48238.88
corr(u_i, Xb)  = 0.7384                         Prob > F           =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   2.997751   .0164867   181.83   0.000     2.965428    3.030074
          x2 |   2.993408   .0157054   190.60   0.000     2.962616    3.024199
       _cons |   2.942511   .0140139   209.97   0.000     2.915036    2.969986
-------------+----------------------------------------------------------------
     sigma_u |  2.0660572
     sigma_e |  .99033698
         rho |  .81316439   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(999, 3998) =     9.81           Prob > F = 0.0000

Finally, the output below illustrates -xtreg, re- is not a consistent
estimator for the coefficients with this data generating process.

. xtreg y x1 x2, re i(id)

Random-effects GLS regression                   Number of obs      =      5000
Group variable (i) : id                         Number of groups   =      1000

R-sq:  within  = 0.9601                         Obs per group: min =         5
       between = 0.9885                                        avg =       5.0
       overall = 0.9674                                        max =         5

Random effects u_i ~ Gaussian                   Wald chi2(2)       = 109595.77
corr(u_i, X)       = 0 (assumed)                Prob > chi2        =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   3.259649   .0197243   165.26   0.000      3.22099    3.298308
          x2 |   3.363142   .0179413   187.45   0.000     3.327977    3.398306
       _cons |   2.958558   .0344846    85.79   0.000      2.89097    3.026147
-------------+----------------------------------------------------------------
     sigma_u |  .72744824
     sigma_e |  .99033698
         rho |  .35046296   (fraction of variance due to u_i)
------------------------------------------------------------------------------


There are many other points that could be made about the results presented
above.  However, I hope that this simulations help to illustrate the basic
points that

        i) -regress , cluster(id)-, -xtreg, re-, and -xtreg, fe- are all
        consistent estimators for the coefficients with the random-effects
        data generating process.

        ii) Tests performed on the coefficients after -regress ,
        cluster(id)-, -xtreg, re-, and -xtreg, fe- will have close to
        nominal coverage when the data was generated by a random-effects
        data generating process.

        iii) -xtreg, re- and -xtreg, fe- produce more efficients estimates
        that -regress, cluster(id)-

	iv) If the covariates are correlated with the id level error
	component, u_i, then only -xtreg, fe- produces consistent estimates.


	--David 
 	  ddrukker@stata.com


----------------- begin fevclust.do---------------------------------------
clear

capture log close 
log using fevclust.log , replace 
set seed 1234567

postfile redat x1_crg x2_crg x1_prg x2_prg x1_cfe x2_cfe x1_pfe x2_pfe /*
	*/ x1_cre x2_cre x1_pre x2_pre using redat, replace double

forvalues i=1/1000 {
	qui {
		drop _all
		set obs 100
		gen ui=2*invnorm(uniform())
		gen id =_n
		expand 5
		sort id
		gen x1=invnorm(uniform())
		gen x2=invnorm(uniform())+.3*x1
		gen eit=invnorm(uniform())
		gen y=3+3*x1+3*x2+ui+eit
		regress y x1 x2,cluster(id)
		scalar x1_crg = _b[x1]
		scalar x2_crg = _b[x2]
		test x1 = 3
		scalar x1_prg = r(p)
		test x2 = 3
		scalar x2_prg = r(p)

		xtreg y x1 x2, fe i(id)
		scalar x1_cfe = _b[x1]
		scalar x2_cfe = _b[x2]
		test x1 = 3
		scalar x1_pfe = r(p)
		test x2 = 3
		scalar x2_pfe = r(p)

		xtreg y x1 x2, re i(id)
		scalar x1_cre = _b[x1]
		scalar x2_cre = _b[x2]
		test x1 = 3
		scalar x1_pre = r(p)
		test x2 = 3
		scalar x2_pre = r(p)

		post redat (x1_crg) (x2_crg) (x1_prg) (x2_prg) (x1_cfe) /*
			*/ (x2_cfe) (x1_pfe) (x2_pfe) (x1_cre) (x2_cre) /*
			*/ (x1_pre) (x2_pre) 
	}	
}

postclose redat

use redat, clear 

gen x1_rjrg=(x1_prg<.05)
gen x2_rjrg=(x2_prg<.05)
gen x1_rjfe=(x1_pfe<.05)
gen x2_rjfe=(x2_pfe<.05)
gen x1_rjre=(x1_pre<.05)
gen x2_rjre=(x2_pre<.05)

sum

save redat, replace

capture log close 
----------------- end fevclust.do---------------------------------------


----------------- begin fe_ex.do---------------------------------------

clear

set seed 1234567
drop _all
set obs 1000
gen ui=2*invnorm(uniform())
gen id =_n
expand 5
sort id
gen x1=invnorm(uniform())+.4*ui
gen x2=invnorm(uniform())+.3*x1 + .4*ui
gen eit=invnorm(uniform())
gen y=3+3*x1+3*x2+ui+eit

corr x1 x2 ui

regress y x1 x2,cluster(id)
xtreg y x1 x2, fe i(id)
xtreg y x1 x2, re i(id)

----------------- end fe_ex.do---------------------------------------
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index