Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
clivelists@googlemail.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Regression with about 5000 (dummy) variables |

Date |
Thu, 19 Apr 2012 15:26:55 +0000 |

I'm pretty sure Paul D. Allison in his excellent 2009 monograph on "Fixed Effects Regression Models" (Sage QASS Paper 160, Thousand Oaks, CA: Sage) said you also add the mean-deviation variables along with the cluster means in the -test- statement as well, so you would have eight variables in this example (assuming that x1-x4 are already demeaned in the -xtreg-, which surely would be?). Indeed, he also says that this procedure has better statistical properties than the Hausman test. I'm having to transcribe some boring but intelligence-sensitive phone conversations at work right now, so I'm not near a copy to confirm this. C -----Original Message----- From: John Antonakis <John.Antonakis@unil.ch> Sender: owner-statalist@hsphsun2.harvard.edu Date: Thu, 19 Apr 2012 16:57:27 To: <statalist@hsphsun2.harvard.edu> Reply-To: statalist@hsphsun2.harvard.eduSubject: Re: st: Regression with about 5000 (dummy) variables Hi: Let me let you in on a trick that is relatively unknown. One way around the problem of a huge amount of dummy variables is to use the Mundlak procedure: Mundlak, Y. (1978). Pooling of Time-Series and Cross-Section Data. Econometrica, 46(1), 69-85. ....for an intuitive explanation, see: Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21(6). 1086-1120. http://www.hec.unil.ch/jantonakis/Causal_Claims.pdf Basically, for each time varying independent variable (x1-x4), take the cluster mean and include that in the regression. That is, do: foreach var of varlist x1-x4 { bys panelvar: egen cl_`var'=mean(`var') } Then, run your regression like this: xtreg y x1-x4 cl_x1-cl_x4, cluster(panelvar) The Hausman test for fixed- versus random-effects is: testparm cl_x1-cl_x4 This will save you on degrees of freedom and computational requirements. This estimator is consistent. Try it out with a subsample of your dataset to see. Many econometricians have been amazed by this. HTH, J. __________________________________________ Prof. John Antonakis Faculty of Business and Economics Department of Organizational Behavior University of Lausanne Internef #618 CH-1015 Lausanne-Dorigny Switzerland Tel ++41 (0)21 692-3438 Fax ++41 (0)21 692-3305 http://www.hec.unil.ch/people/jantonakis Associate Editor The Leadership Quarterly __________________________________________ On 19.04.2012 16:39, Suryadipta Roy wrote: > Dear Statalisters, > > I am trying to run a fixed effects panel regression which has more > than 4000 dummies (based on theory in the gravity model literature in > inernational economics), and hence close to 5000 variables in the > regression. The coefficients of the dummy variables are not of any > interest. The code is as follows: xtreg y x1 x2...... imp_time_* > exp_time_*, fe cluster(panelvar), where panelvar has been set using - > xtset- , and imp_time and exp_time are importer-time and exporter-time > fixed effects respectively. However, the regression had run close to 2 > hours without generating any result at which I stopped it using > -Break- . I had set the memory to 5000m, and the matsize to 5000 using > -set- . > > My Stata specification is Stata/SE 11.2 for Windows (64-bit x86-64). > My PC specification: Processor- intel core i5-2430M CPU @ 2.40GhZ; > RAM- 8 GB, in a 64-bit OS. > > I would have greatly appreciated some help to find out if this is > normal for Stata to take this much time (or more) in the presence of a > large number of variables, and if there is a way to accomplish the > task faster. The gravity literature has suggested a couple of ways to > do this without the dummy variable approach, but I was trying to find > out if there is a better way to do it if I persist with the dummy > variables. Any help is greatly appreciated. > > Best regards, > Suryadipta. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Regression with about 5000 (dummy) variables***From:*Suryadipta Roy <sroy2138@gmail.com>

**Re: st: Regression with about 5000 (dummy) variables***From:*John Antonakis <John.Antonakis@unil.ch>

- Prev by Date:
**Re: st: Regression with about 5000 (dummy) variables** - Next by Date:
**st: Heckman sample selection with count and censored outcome equations** - Previous by thread:
**Re: st: Regression with about 5000 (dummy) variables** - Next by thread:
**Re: Re: st: Regression with about 5000 (dummy) variables** - Index(es):