Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Three Fixed Effects with Millions of Observations


From   Fernando Rios Avila <[email protected]>
To   [email protected]
Subject   Re: st: Three Fixed Effects with Millions of Observations
Date   Wed, 19 Mar 2014 17:28:05 -0400

Hi George,
There are a couple of options you can use to estimate your model.
However, because you are dealing with such a large data set, you will
need a lot of patience.
you can always try to estimate the model using the standard -areg-,
-xtreg, fe- among other possibilities.
For a quick review of some of the available commands you can check the
following paper:
http://www.stata-journal.com/article.html?article=st0267

Now, for a non linear model such as a Poisson model, I would suggest
to check Paulo Guimaraes and Pedro Portugal paper :

www.stata-journal.com/article.html?article=st0212

His method can potentially be applied for more than 2 fixed effects,
while including the third one as a set of dummy variables.

For a more direct approach,  the paper entitled "OLS with multiple
high dimensional category variables" by Simen Gaure proposes a method
and provides its implementation in R:
paper http://www.sciencedirect.com/science/article/pii/S0167947313001266
R Code: http://cran.r-project.org/web/packages/lfe/index.html

Finally, Although without the cluster correction, I suggest a
implementation code for Stata for an algorithm similar to Guimaraes
and Portugal strategy and closer to Gaure, which directly implements
the case for 3 or more fixed effects.
http://www.levyinstitute.org/publications/?docid=1971
Best


On Wed, Mar 19, 2014 at 5:06 PM, George Shoukry <[email protected]> wrote:
> I have a data set with over 10 million observations and each
> observation is uniquely identified by three variables (say time, firm,
> county). I would like to include fixed effects for the three
> identifying variables, cluster the standard errors at the firm level,
> and run OLS and Poisson regressions for some variables in the data. I
> have two questions:
>
> 1. Ideally I want to do "reg y x i.firm i.time i.county, vce(cluster
> firm)", but this takes too long (not sure exactly how long because I
> stopped it after a while). So far I've been able to get OLS estimates
> on my computer using the undocumented _regress command with the
> absorb() option. The county identifier has the most number of values,
> so I do something like "_regress y x i.firm i.time, absorb(county)".
> The problem is that I cannot seem to cluster the errors at the firm
> level with the _regress command and I can't find documentation for it.
> Any ideas on the fastest way in Stata to obtain OLS estimates in this
> case with clustered errors?
> Note: I tried some other options but they seem to take too long (how
> long do you leave commands running before you stop them?).
>
> 2. Any experience with the best way to run a fixed-effects Poisson
> regression with a large dataset and several fixed effects?
>
> Thanks!
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index