Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regression with about 5000 (dummy) variables


From   John Antonakis <John.Antonakis@unil.ch>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Regression with about 5000 (dummy) variables
Date   Fri, 20 Apr 2012 16:06:17 +0200

OK; I did call it a trick, didn't I? :-)

Though, what matters is that the estimator is consistent, and computationally efficient. That is the problem that this estimator can address in high-dimensional fixed-effect data. It takes a second to estimate something that take hours with dummy variables. So I take back the comment on the DFs.

Best,
J.

__________________________________________

Prof. John Antonakis
Faculty of Business and Economics
Department of Organizational Behavior
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
http://www.hec.unil.ch/people/jantonakis

Associate Editor
The Leadership Quarterly
__________________________________________


On 20.04.2012 15:33, Christopher Baum wrote:
<>
On Apr 20, 2012, at 2:33 AM, John wrote:

The estimator still seems good. Notice, though, that the F-test
numerator DFs are only 3.  So that's what I meant when I said we save on
DF (as compared to the OLS fixed-effects estimator).
You're cheating, John, by overlooking that theorem about the absence of a free lunch.
The number of DF is not 3. When you gave the commands

bys idcode : egen double cl_age_id = mean(age)
bys south : egen double cl_age_south = mean(age)

you computed a large number of sample means, so the cluster-mean regressors in the Mundlak
model must be considered as not single DF, but as many DF as it took to compute them,
following the logic of the FE model.

I could make the same mistake by applying the within transformation to Y and X and
running OLS, which would have one DF for the one slope estimated. But if I used the
FE estimator it would properly account for the fact that the within transformation is not
free, as this is just the LSDV (dummy-variable) model, and putting in all those dummies is not free.
Neither is this method. Your DF should be adjusted to reflect the creation of those
cl_* variables.

Kit

Kit Baum   |   Boston College Economics&  DIW Berlin   |   http://ideas.repec.org/e/pba1.html
                              An Introduction to Stata Programming  |   http://www.stata-press.com/books/isp.html
   An Introduction to Modern Econometrics Using Stata  |   http://www.stata-press.com/books/imeus.html


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index