[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"David M. Drukker, StataCorp" <ddrukker@stata.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
RE: st: Stata-SE and Stata on the same server |

Date |
Wed, 16 Jun 2004 10:02:54 -0500 |

Daniel Feenberg <feenberg@nber.org> wrote > Most requests here at NBER for Stata-SE are from users with fixed > effect models who expect to add a dummy variable for each respondent > in a panel. They are usually easily convinced that this is not > necessary. However sometimes users want to interact a time trend > with the fixed effect. Is there a way to estimate such a model > without adding a variable for each respondent? Short-answer ------------ One way to estimate this type of model is to double difference the data and estimate the parameters via ordinary least squares with cluster-robust standard errors. Long-answer ------------ Consider the model y_it = u_i + a_i*t + B x_it + e_it where y_it is the dependent variable u_i is the unobserved individual specific intercept that may be correlated with a_i and x_it a_i is the unobserved individual specific trend, which may be correlated with u_i and x_it x_it is a vector of time-varying covariates, which may be correlated with a_i and u_i B is a vector of coefficients on x_it e_it is idiosyncratic error that is independently distributed over the the panels (Notes: the e_it may have some serial correlation and the independence over the panels is unnecessarily strong.) Let's begin with the case in which there are no gaps withins the panels. (We drop this assumption below.) The number of observations per panel may vary. First differencing the data removes the individual specific intercept D.y_it = a_i + B D.x_it + D.e_it This is a standard fixed effects model, the parameters of which could be estimated by -xtreg, fe-. As with the simple fixed-effects model, we could estimate the parameters by differencing the data applying ordinary least squares. Differencing the data again yields D2.y_it = B D2.x_it + D2.e_it Recall that at the beginning of this example, I assumed that there were no gaps in the data. The assumption of no gaps is crucial if one wants to apply the standard FE estimator on the first-differenced data. The assumption is not necessary for the double difference model because the gaps will simply cause a loss of observations. Here is an example that simulates some data and runs the regressions. First, let's simulate some data. ------------------- begin data generation section ------------------------- . clear . set seed 12345 . set mem 50m (51200k) . . set obs 500 obs was 0, now 500 . . gen id = _n . . gen ui = invchi2(2,uniform()) . . gen ai = invnorm(uniform()) +.3*ui . . expand 10 (4500 observations created) . . sort id . by id: gen t = _n . . tsset id t panel variable: id, 1 to 500 time variable: t, 1 to 10 . . gen x1 = invchi2(2,uniform()) + .5*t + .3*ui . gen x2 = invchi2(2,uniform()) + .7*t + .4*ui . . gen eit =invchi2(2,uniform()) . . gen y = ui + ai*t + 1*x1 + 2*x2 + eit ------------------- end data generation section ------------------------- The data generating process is standard. Note that a_i, u_i and x_it are all correlated with each other. Removing these correlations would allow you to use other estimators. Just to highlight that normality is not required, I avoided using normal errors. (I made the a_i normal to illustrate that the individual specific time trends need not all have the same sign.) The correlation between a_i and u_i is such that the FE estimator will be inconsistent. . xtreg y x1 x2, fe Fixed-effects (within) regression Number of obs = 5000 Group variable (i): id Number of groups = 500 R-sq: within = 0.8094 Obs per group: min = 10 between = 0.5605 avg = 10.0 overall = 0.6106 max = 10 F(2,4498) = 9552.53 corr(u_i, Xb) = 0.1865 Prob > F = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.258295 .0282743 44.50 0.000 1.202864 1.313727 x2 | 2.424745 .0244308 99.25 0.000 2.376848 2.472641 _cons | 3.570195 .1781392 20.04 0.000 3.220954 3.919435 -------------+---------------------------------------------------------------- sigma_u | 7.2627525 sigma_e | 4.2590515 rho | .74410687 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(499, 4498) = 27.96 Prob > F = 0.0000 Ordinary least squares on the double differenced data, produces consistent estimates. I clustered on -id- to account for the within panel serial correlation that is present even if the original error e_it has no serial correlation. . reg d2.(y x1 x2), nocons cluster(id) Regression with robust standard errors Number of obs = 4000 F( 2, 499) = 5358.14 Prob > F = 0.0000 R-squared = 0.8375 Number of clusters (id) = 500 Root MSE = 4.9086 ------------------------------------------------------------------------------ | Robust D2.y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | D2 | 1.00929 .0212335 47.53 0.000 .9675721 1.051008 x2 | D2 | 2.010881 .0213753 94.07 0.000 1.968884 2.052877 ------------------------------------------------------------------------------ Now let's illustrate that gaps in the panels cause the expected loss of observations. . replace y = . if t == 5 (500 real changes made, 500 to missing) . . reg d2.(y x1 x2), nocons cluster(id) Regression with robust standard errors Number of obs = 2500 F( 2, 499) = 3906.51 Prob > F = 0.0000 R-squared = 0.8376 Number of clusters (id) = 500 Root MSE = 4.9251 ------------------------------------------------------------------------------ | Robust D2.y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | D2 | 1.038029 .0258706 40.12 0.000 .9872002 1.088858 x2 | D2 | 2.006183 .0253875 79.02 0.000 1.956303 2.056062 ------------------------------------------------------------------------------ David ddrukker@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Criteria for stratification??** - Next by Date:
**RE: st: parametric vs. nonparametric estimators** - Previous by thread:
**st: RE: Criteria for stratification??** - Next by thread:
**st: Taking Means of Vars Across Time Period** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |