Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regression with about 5000 (dummy) variables


From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st: Regression with about 5000 (dummy) variables
Date   Thu, 19 Apr 2012 11:16:29 -0400

John Antonakis <[email protected]>:
The poster asked about multiple dimensions of fixed effects--how does
the advice below relate?
The approach shown actually adds to the size of the matrix to be inverted.
You assert that
"This will save you on degrees of freedom and computational requirements."
--can you clarify that claim?
Your
 xtreg y x1-x4 cl_x1-cl_x4, cluster(panelvar)
is nearly the same as
 xtreg y x1-x4, fe robust
right? Note that inference is not identical, as the RE estimator
does not "know" the means are estimated.

On Thu, Apr 19, 2012 at 10:57 AM, John Antonakis <[email protected]> wrote:
> Hi:
>
> Let me let you in on a trick that is relatively unknown.
>
> One way around the problem of a huge amount of dummy variables is to use the
> Mundlak procedure:
>
> Mundlak, Y. (1978). Pooling of Time-Series and Cross-Section Data.
> Econometrica, 46(1), 69-85.
>
> ....for an intuitive explanation, see:
>
> Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making
> causal claims: A review and recommendations. The Leadership Quarterly,
> 21(6). 1086-1120. http://www.hec.unil.ch/jantonakis/Causal_Claims.pdf
>
> Basically, for each time varying independent variable (x1-x4), take the
> cluster mean and include that in the regression.  That is, do:
>
> foreach var of varlist x1-x4 {
> bys panelvar: egen cl_`var'=mean(`var')
> }
>
> Then, run your regression like this:
>
> xtreg y x1-x4 cl_x1-cl_x4, cluster(panelvar)
>
> The Hausman test for fixed- versus random-effects is:
>
> testparm cl_x1-cl_x4
>
> This will save you on degrees of freedom and computational requirements.
> This estimator is consistent.  Try it out with a subsample of your dataset
> to see. Many econometricians have been amazed by this.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index