Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Stata13 Wishlist- dealing with large number of fixed effects and dummy variables


From   Suryadipta Roy <sroy2138@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Stata13 Wishlist- dealing with large number of fixed effects and dummy variables
Date   Tue, 5 Feb 2013 12:58:14 -0500

Dear Billy and Nick,
Thank you very much for your kind attention to the problem! A brief
summary of my dataset below:
summ imp_group exp_group pair year importer_year exporter_year

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   importer |    744869    93.93408    54.62891   1        192
   exporter |    744869    104.5717     60.8408    1        214
countrypair |    744869    16842.43    9770.748  1      34089
        year |    744869    1998.026    7.607853   1984   2010
importer_year | 744869  2354.97  1370.195       1       4773
exporter_year | 744869  2656.112  1539.67       1       5385

The incorporation of various kinds of fixed effects in the regressions
to explain bilateral import/export (e.g. country FE i.e. importer &
exporter/ countrypair FE i.e. importer-exporter pair/ or importer_year
& exporter_year FE) is driven by theoretical concerns to control for
multilateral heterogeneity among the country pairs. In this regard, I
have found -areg- to be very helpful when I have tried to control for
country-pair and year fixed effects:
areg bilateral import/export explanatory_variables year*, absorb(pair)
robust cluster(pair)

However, I have run into problems when I have tried to run the above
regression with country-year fixed effects, e.g. I have not been able
to obtain a result with my dataset when I have tried something like:
areg bilateral import/export explanatory_variables importer_year*,
absorb(exporter_year*) robust cluster(pair), OR
areg bilateral import/export explanatory_variables exporter_year*
importer_year*, absorb(pair) robust cluster(pair).

Similarly, -heckman- has been running forever without any result:
heckman dep_var ind_vars i.pair i.year, select(select_depvar =
ind_vars ind_vars excluded_var i.pair i.year) twostep.

These are some examples of problems that I have run into with large
datasets and regressions with large number of fixed effects. I wish
there are easier ways to incorporate a large number of fixed effects.
I am familiar with the paper by Andrews, Schank, Upward, "Practical
fixed-effects estimation methods for the three-way error-components
model", Stata Journal, 2006, 6, Number 4, pp. 461–481, but I am not
clear if this can be applied for Heckman selection models.

Sincerely,
Suryadipta.


On Tue, Feb 5, 2013 at 12:14 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> I agree with Billy.
>
> Although I sympathise with Suryadipta, it is difficult to see what
> kind of help is being asked for here that Stata 13 should provide.
> That a program should signal that you are trying something too
> difficult? That the documentation should include advice on modelling
> strategy? Specific suggestions are surely needed here.
>
> Nick
>
> On Tue, Feb 5, 2013 at 1:40 PM, William Buchanan
> <william@williambuchanan.net> wrote:
>
>> How many observations are in your dataset and have you considered that the problem might be with the model you are fitting to the data?  It doesn't seem remotely parsimonious to use that many indicators (or it hardly seems that a model with that many variables is simplifying the true data generating process).  Maybe you could provide some example that others could replicate in order to get a better idea of the difficulty you're running into.
>
> On Feb 5, 2013, at 5:08, Suryadipta Roy <sroy2138@gmail.com> wrote:
>
>>> This is a general SOS call in dealing with large number of fixed
>>> effects (or dummy variables) with commonly used Stata commands, e.g.
>>> -reg- , -heckman- , -xtreg-, etc., viz. in dealing with large dyadic
>>> datasets (e.g. used in the gravity literature in international
>>> economics). For example, I was trying to run -heckman- with over
>>> 34,000 dummy variables in a panel data for over 150 importer-exporter
>>> countries over 25 years, where I need to control for various kinds of
>>> fixed effects, and Stata has not been able to complete a single
>>> regression after running for about 10 hours. I have had similar
>>> experiences with the above-mentioned commands as well for large
>>> datasets. -areg- , to a certain extent addresses the problem, but then
>>> runs into problems when one needs to control for a different kinds of
>>> fixed effects. For the record, I am using StataMP 12.1 in a dual core
>>> processor machine (8 GB memory), and the form of the -heckman-command
>>> that I was using is below:
>>> heckman dep_var ind_vars, select(selct_depvar = ind_vars excluded_var
>>> i.fixed_effects1 i.fixed_effects2) twostep, where fixed_effects1 and
>>> fixed_effects2 each comprises of a large number of dummy variables.
>>>
>>> I wish there was a general help for this kind of problem with Stata 13.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index