Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regression with about 5000 (dummy) variables


From   clivelists@googlemail.com
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Regression with about 5000 (dummy) variables
Date   Thu, 19 Apr 2012 15:26:55 +0000

I'm pretty sure Paul D. Allison in his excellent 2009 monograph on "Fixed Effects Regression Models" (Sage QASS Paper 160, Thousand Oaks, CA: Sage) said you also add the mean-deviation variables along with the cluster means in the -test- statement as well, so you would have eight variables in this example (assuming that x1-x4 are already demeaned in the -xtreg-, which surely would be?). 

Indeed, he also says that this procedure has better statistical properties than the Hausman test. 

I'm having to transcribe some boring but intelligence-sensitive phone conversations at work right now, so I'm not near a copy to confirm this.

C

-----Original Message-----
From: John Antonakis <John.Antonakis@unil.ch>
Sender: owner-statalist@hsphsun2.harvard.edu
Date: Thu, 19 Apr 2012 16:57:27 
To: <statalist@hsphsun2.harvard.edu>
Reply-To: statalist@hsphsun2.harvard.eduSubject: Re: st: Regression with about 5000 (dummy) variables

Hi:

Let me let you in on a trick that is relatively unknown.

One way around the problem of a huge amount of dummy variables is to use 
the Mundlak procedure:

Mundlak, Y. (1978). Pooling of Time-Series and Cross-Section Data. 
Econometrica, 46(1), 69-85.

....for an intuitive explanation, see:

Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On 
making causal claims: A review and recommendations. The Leadership 
Quarterly, 21(6). 1086-1120. 
http://www.hec.unil.ch/jantonakis/Causal_Claims.pdf

Basically, for each time varying independent variable (x1-x4), take the 
cluster mean and include that in the regression.  That is, do:

foreach var of varlist x1-x4 {
bys panelvar: egen cl_`var'=mean(`var')
}

Then, run your regression like this:

xtreg y x1-x4 cl_x1-cl_x4, cluster(panelvar)

The Hausman test for fixed- versus random-effects is:

testparm cl_x1-cl_x4

This will save you on degrees of freedom and computational requirements. 
This estimator is consistent.  Try it out with a subsample of your 
dataset to see. Many econometricians have been amazed by this.

HTH,
J.

__________________________________________

Prof. John Antonakis
Faculty of Business and Economics
Department of Organizational Behavior
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
http://www.hec.unil.ch/people/jantonakis

Associate Editor
The Leadership Quarterly
__________________________________________


On 19.04.2012 16:39, Suryadipta Roy wrote:
 > Dear Statalisters,
 >
 > I am  trying to run a fixed effects panel regression which has more
 > than 4000 dummies (based on theory in the gravity model literature in
 > inernational economics), and hence close to 5000 variables in the
 > regression. The coefficients of the dummy variables are not of any
 > interest. The code is as follows: xtreg y x1 x2...... imp_time_*
 > exp_time_*, fe cluster(panelvar), where panelvar has been set using -
 > xtset- , and imp_time and exp_time are importer-time and exporter-time
 > fixed effects respectively. However, the regression had run close to 2
 > hours without generating any result at which I stopped it using
 > -Break- . I had set the memory to 5000m, and the matsize to 5000 using
 > -set- .
 >
 > My Stata specification is Stata/SE 11.2 for Windows (64-bit x86-64).
 > My PC specification: Processor- intel core i5-2430M CPU @ 2.40GhZ;
 > RAM- 8 GB, in a 64-bit OS.
 >
 > I would have greatly appreciated some help to find out if this is
 > normal for Stata to take this much time (or more) in the presence of a
 > large number of variables, and if there is a way to accomplish the
 > task faster. The gravity literature has suggested a couple of ways to
 > do this without the dummy variable approach, but I was trying to find
 > out if there is a better way to do it if I persist with the dummy
 > variables. Any help is greatly appreciated.
 >
 > Best regards,
 > Suryadipta.
 > *
 > *   For searches and help try:
 > *   http://www.stata.com/help.cgi?search
 > *   http://www.stata.com/support/statalist/faq
 > *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index