[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
ddrukker@stata.com (David M. Drukker, Stata Corp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Cross-Sectional Time Series |

Date |
Wed, 26 Jun 2002 10:31:05 -0500 |

John Neumann <neumannj@bu.edu> began an interesting thread on this list yesterday when he asked whether he should use -reg ,cluster(id)-, -xtreg , re- or -xtreg, fe- for estimation and inference about his panel data model. Then anirban basu <abasu@midway.uchicago.edu> and Mark Schaffer <M.E.Schaffer@hw.ac.uk> both provided interesting responses to John's original question. Both Anirban and Mark have pointed out that -regress , cluster(id)- will provide consistent estimates of the coefficients. There is also agreement that -xtreg, re- will provide consistent estimates of the coefficients. But there seems to some discussion of whether -xtreg, fe- will provide consistent estimates. In theory, all three estimators (-regress ,cluster(id)-, -xtreg,re- , -xtreg, fe-) are consistent estimators of the coefficients for random-effects data generating processes. To flush the details, consider the random-effects data generating process y_it = X_it b + u_i + e_it where X_it is a 1 x K vector of covariates, b is K x 1 vector of coefficients, u_i is identically, independently distributed (iid) over id's, e_it is iid over the observations, and there is no correlation between X_it and u_i. Under these assumptions, all three estimators all provide consistent estimators of the VCE matrix and the resulting Wald tests will obtain nominal coverage, given enough data. Another theoretical point is in order. For the random-effects data generating process -xtreg, re- should provide more efficient estimates of the coefficients than either of the other two. While -xtreg, fe- should produce more efficient estimates than -regress, cluster(id)-. (One caveat in this case is that inference is said to be conditional on the random-effects in the sample.) To illustrate these points, I have written a small simulation. The do file is appended to the below my signature. Breifly the program i) produces 1000 draws from a parameterization of the random-effects data generating process ii) runs the three estimators on each sample, saving off the coefficients iii) uses -test- to test that coefficients are equal to their true values, saving off p-values iv) then computes the coverage rates obtained by each estimator on on each test Let's begin looking at the results for the coefficients. First, we need to understand the variable names. As can be seen from the program fevclust.do, appended below, Variable name Meaning x1_crg -coefficient on x1 from -regress, cluster(id)- x2_crg -coefficient on x2 from -regress, cluster(id)- x1_cfe -coefficient on x1 from -xtreg, fe- x2_cfe -coefficient on x2 from -xtreg, fe- x1_cre -coefficient on x1 from -xtreg, fe- x2_cre -coefficient on x2 from -xtreg, fe- Now for these results. The table below presents the summary statistics from these variables obtained over the 1000 samples that were generated. Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- x1_crg | 1000 2.99791 .1070938 2.540678 3.372749 x2_crg | 1000 3.001715 .1044514 2.71244 3.303503 x1_cfe | 1000 3.000676 .051932 2.847759 3.144626 x2_cfe | 1000 3.003031 .0496754 2.845199 3.160065 x1_cre | 1000 3.000508 .0515863 2.846256 3.150228 x2_cre | 1000 3.00292 .0498531 2.839417 3.163176 There are several points to note. The mean of the estimates of each estimator is very close to the true value of 3.0. Second the standard deviation of the estimates from -regress, cluster(id)- is about twice the standard deviations from the other two estimators. This indicates that -xtreg,re- and -xtreg,fe- are more efficient that -regress, cluster(id)-. Third, the standard deviation of the estimates from -xtreg, fe- are surprisingly close to those of -xtreg, re-. This indicates that for this parameterization of the data generating process and sample size, -xtreg, fe- is as efficient an estimator as -xtreg, re-. Now let's consider coverage. The results table below contains the means of 6 binary variables from the 1000 generated samples. In each sample, each variable is 1 if the test in question was rejected for that sample and zero otherwise. Thus the means in the table below can be interpreted as emprical coverage rates. x1_rjrg fraction of tests in which the true null that x1=3 was rejected after -reg ,cluster(id) x2_rjrg fraction of tests in which the true null that x2=3 was rejected after -reg ,cluster(id) x1_rjfe fraction of tests in which the true null that x1=3 was rejected after -xtreg, fe- x2_rjfe fraction of tests in which the true null that x2=3 was rejected after -xtreg, re- x1_rjre fraction of tests in which the true null that x1=3 was rejected after -xtreg, re- x2_rjre fraction of tests in which the true null that x2=3 was rejected after -xtreg, fe- And the results are Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- x1_rjrg | 1000 .05 .218054 0 1 x2_rjrg | 1000 .07 .2552747 0 1 x1_rjfe | 1000 .054 .2261308 0 1 x2_rjfe | 1000 .039 .1936918 0 1 x1_rjre | 1000 .059 .2357426 0 1 x2_rjre | 1000 .042 .2006895 0 1 Note the all the tests are reasonable close to nominal coverage. Also note that the tests after -xtreg, re- are marginally closer to nominal than those after -xtreg, fe-. There is one final point that must be made. The crutial assumption in the above data generating process is that X_it and u_i are not correlated. If they are correlated only -xtreg, fe- will provide consistent estimates. Below -fevclust.do-, I have append a second program, called fe_ex.do, that illustrates this point. -fe_ex.do- generates a single large sample from the same structure as above, EXCEPT that there is correlation between X_it and u_i. Here are the crutial correlations in our sample . corr x1 x2 ui (obs=5000) | x1 x2 ui -------------+--------------------------- x1 | 1.0000 x2 | 0.6081 1.0000 ui | 0.6349 0.7170 1.0000 Since all true values of the coefficients are 3.0, the output below illustrates, -regress, cluster(id)- is not consistent for this data generating process. . regress y x1 x2,cluster(id) Regression with robust standard errors Number of obs = 5000 F( 2, 999) =34624.91 Prob > F = 0.0000 R-squared = 0.9674 Number of clusters (id) = 1000 Root MSE = 1.6459 ------------------------------------------------------------------------------ | Robust y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 3.498473 .024788 141.14 0.000 3.449831 3.547116 x2 | 3.701595 .0232616 159.13 0.000 3.655948 3.747242 _cons | 2.97323 .0336344 88.40 0.000 2.907228 3.039232 ------------------------------------------------------------------------------ In contrast, the output below illustrates -xtreg, fe- is a consistent estimator for the coefficients with this data generating process. . xtreg y x1 x2, fe i(id) Fixed-effects (within) regression Number of obs = 5000 Group variable (i) : id Number of groups = 1000 R-sq: within = 0.9602 Obs per group: min = 5 between = 0.9885 avg = 5.0 overall = 0.9672 max = 5 F(2,3998) = 48238.88 corr(u_i, Xb) = 0.7384 Prob > F = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 2.997751 .0164867 181.83 0.000 2.965428 3.030074 x2 | 2.993408 .0157054 190.60 0.000 2.962616 3.024199 _cons | 2.942511 .0140139 209.97 0.000 2.915036 2.969986 -------------+---------------------------------------------------------------- sigma_u | 2.0660572 sigma_e | .99033698 rho | .81316439 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(999, 3998) = 9.81 Prob > F = 0.0000 Finally, the output below illustrates -xtreg, re- is not a consistent estimator for the coefficients with this data generating process. . xtreg y x1 x2, re i(id) Random-effects GLS regression Number of obs = 5000 Group variable (i) : id Number of groups = 1000 R-sq: within = 0.9601 Obs per group: min = 5 between = 0.9885 avg = 5.0 overall = 0.9674 max = 5 Random effects u_i ~ Gaussian Wald chi2(2) = 109595.77 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 3.259649 .0197243 165.26 0.000 3.22099 3.298308 x2 | 3.363142 .0179413 187.45 0.000 3.327977 3.398306 _cons | 2.958558 .0344846 85.79 0.000 2.89097 3.026147 -------------+---------------------------------------------------------------- sigma_u | .72744824 sigma_e | .99033698 rho | .35046296 (fraction of variance due to u_i) ------------------------------------------------------------------------------ There are many other points that could be made about the results presented above. However, I hope that this simulations help to illustrate the basic points that i) -regress , cluster(id)-, -xtreg, re-, and -xtreg, fe- are all consistent estimators for the coefficients with the random-effects data generating process. ii) Tests performed on the coefficients after -regress , cluster(id)-, -xtreg, re-, and -xtreg, fe- will have close to nominal coverage when the data was generated by a random-effects data generating process. iii) -xtreg, re- and -xtreg, fe- produce more efficients estimates that -regress, cluster(id)- iv) If the covariates are correlated with the id level error component, u_i, then only -xtreg, fe- produces consistent estimates. --David ddrukker@stata.com ----------------- begin fevclust.do--------------------------------------- clear capture log close log using fevclust.log , replace set seed 1234567 postfile redat x1_crg x2_crg x1_prg x2_prg x1_cfe x2_cfe x1_pfe x2_pfe /* */ x1_cre x2_cre x1_pre x2_pre using redat, replace double forvalues i=1/1000 { qui { drop _all set obs 100 gen ui=2*invnorm(uniform()) gen id =_n expand 5 sort id gen x1=invnorm(uniform()) gen x2=invnorm(uniform())+.3*x1 gen eit=invnorm(uniform()) gen y=3+3*x1+3*x2+ui+eit regress y x1 x2,cluster(id) scalar x1_crg = _b[x1] scalar x2_crg = _b[x2] test x1 = 3 scalar x1_prg = r(p) test x2 = 3 scalar x2_prg = r(p) xtreg y x1 x2, fe i(id) scalar x1_cfe = _b[x1] scalar x2_cfe = _b[x2] test x1 = 3 scalar x1_pfe = r(p) test x2 = 3 scalar x2_pfe = r(p) xtreg y x1 x2, re i(id) scalar x1_cre = _b[x1] scalar x2_cre = _b[x2] test x1 = 3 scalar x1_pre = r(p) test x2 = 3 scalar x2_pre = r(p) post redat (x1_crg) (x2_crg) (x1_prg) (x2_prg) (x1_cfe) /* */ (x2_cfe) (x1_pfe) (x2_pfe) (x1_cre) (x2_cre) /* */ (x1_pre) (x2_pre) } } postclose redat use redat, clear gen x1_rjrg=(x1_prg<.05) gen x2_rjrg=(x2_prg<.05) gen x1_rjfe=(x1_pfe<.05) gen x2_rjfe=(x2_pfe<.05) gen x1_rjre=(x1_pre<.05) gen x2_rjre=(x2_pre<.05) sum save redat, replace capture log close ----------------- end fevclust.do--------------------------------------- ----------------- begin fe_ex.do--------------------------------------- clear set seed 1234567 drop _all set obs 1000 gen ui=2*invnorm(uniform()) gen id =_n expand 5 sort id gen x1=invnorm(uniform())+.4*ui gen x2=invnorm(uniform())+.3*x1 + .4*ui gen eit=invnorm(uniform()) gen y=3+3*x1+3*x2+ui+eit corr x1 x2 ui regress y x1 x2,cluster(id) xtreg y x1 x2, fe i(id) xtreg y x1 x2, re i(id) ----------------- end fe_ex.do--------------------------------------- * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Programming question** - Next by Date:
**Thanks Re: st: RE: contract command** - Previous by thread:
**st: Programming question** - Next by thread:
**Re: st: Cross-Sectional Time Series** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |