Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Suryadipta Roy <sroy2138@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Stata13 Wishlist- dealing with large number of fixed effects and dummy variables |

Date |
Tue, 5 Feb 2013 12:58:14 -0500 |

Dear Billy and Nick, Thank you very much for your kind attention to the problem! A brief summary of my dataset below: summ imp_group exp_group pair year importer_year exporter_year Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- importer | 744869 93.93408 54.62891 1 192 exporter | 744869 104.5717 60.8408 1 214 countrypair | 744869 16842.43 9770.748 1 34089 year | 744869 1998.026 7.607853 1984 2010 importer_year | 744869 2354.97 1370.195 1 4773 exporter_year | 744869 2656.112 1539.67 1 5385 The incorporation of various kinds of fixed effects in the regressions to explain bilateral import/export (e.g. country FE i.e. importer & exporter/ countrypair FE i.e. importer-exporter pair/ or importer_year & exporter_year FE) is driven by theoretical concerns to control for multilateral heterogeneity among the country pairs. In this regard, I have found -areg- to be very helpful when I have tried to control for country-pair and year fixed effects: areg bilateral import/export explanatory_variables year*, absorb(pair) robust cluster(pair) However, I have run into problems when I have tried to run the above regression with country-year fixed effects, e.g. I have not been able to obtain a result with my dataset when I have tried something like: areg bilateral import/export explanatory_variables importer_year*, absorb(exporter_year*) robust cluster(pair), OR areg bilateral import/export explanatory_variables exporter_year* importer_year*, absorb(pair) robust cluster(pair). Similarly, -heckman- has been running forever without any result: heckman dep_var ind_vars i.pair i.year, select(select_depvar = ind_vars ind_vars excluded_var i.pair i.year) twostep. These are some examples of problems that I have run into with large datasets and regressions with large number of fixed effects. I wish there are easier ways to incorporate a large number of fixed effects. I am familiar with the paper by Andrews, Schank, Upward, "Practical fixed-effects estimation methods for the three-way error-components model", Stata Journal, 2006, 6, Number 4, pp. 461–481, but I am not clear if this can be applied for Heckman selection models. Sincerely, Suryadipta. On Tue, Feb 5, 2013 at 12:14 PM, Nick Cox <njcoxstata@gmail.com> wrote: > I agree with Billy. > > Although I sympathise with Suryadipta, it is difficult to see what > kind of help is being asked for here that Stata 13 should provide. > That a program should signal that you are trying something too > difficult? That the documentation should include advice on modelling > strategy? Specific suggestions are surely needed here. > > Nick > > On Tue, Feb 5, 2013 at 1:40 PM, William Buchanan > <william@williambuchanan.net> wrote: > >> How many observations are in your dataset and have you considered that the problem might be with the model you are fitting to the data? It doesn't seem remotely parsimonious to use that many indicators (or it hardly seems that a model with that many variables is simplifying the true data generating process). Maybe you could provide some example that others could replicate in order to get a better idea of the difficulty you're running into. > > On Feb 5, 2013, at 5:08, Suryadipta Roy <sroy2138@gmail.com> wrote: > >>> This is a general SOS call in dealing with large number of fixed >>> effects (or dummy variables) with commonly used Stata commands, e.g. >>> -reg- , -heckman- , -xtreg-, etc., viz. in dealing with large dyadic >>> datasets (e.g. used in the gravity literature in international >>> economics). For example, I was trying to run -heckman- with over >>> 34,000 dummy variables in a panel data for over 150 importer-exporter >>> countries over 25 years, where I need to control for various kinds of >>> fixed effects, and Stata has not been able to complete a single >>> regression after running for about 10 hours. I have had similar >>> experiences with the above-mentioned commands as well for large >>> datasets. -areg- , to a certain extent addresses the problem, but then >>> runs into problems when one needs to control for a different kinds of >>> fixed effects. For the record, I am using StataMP 12.1 in a dual core >>> processor machine (8 GB memory), and the form of the -heckman-command >>> that I was using is below: >>> heckman dep_var ind_vars, select(selct_depvar = ind_vars excluded_var >>> i.fixed_effects1 i.fixed_effects2) twostep, where fixed_effects1 and >>> fixed_effects2 each comprises of a large number of dummy variables. >>> >>> I wish there was a general help for this kind of problem with Stata 13. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Stata13 Wishlist- dealing with large number of fixed effects and dummy variables***From:*Suryadipta Roy <sroy2138@gmail.com>

**Re: st: Stata13 Wishlist- dealing with large number of fixed effects and dummy variables***From:*William Buchanan <william@williambuchanan.net>

**Re: st: Stata13 Wishlist- dealing with large number of fixed effects and dummy variables***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**RE: re: st: Package -r2c- now available in SSC** - Next by Date:
**Re: st: Modeling simultaneity** - Previous by thread:
**Re: st: Stata13 Wishlist- dealing with large number of fixed effects and dummy variables** - Next by thread:
**st: nl command - error#130 expression too long** - Index(es):