Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Three Fixed Effects with Millions of Observations


From   William Buchanan <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Three Fixed Effects with Millions of Observations
Date   Wed, 19 Mar 2014 16:45:46 -0500

You could use the group function of -egen- to create a single index that you feed into -areg- or something of that sort.  But it isn't clear what your end goal is or whether or not other options would be viable alternatives (e.g., mixed effects models).  Do you have enough degrees of freedom after including all of the fixed effects to estimate the clustered VCE?  

Maybe if you provide a little more information about your data it could be helpful.  You could also take a random sample from your data and use that as a training dataset to test different model specifications before fitting the model to the full data set.

Sent from my iPhone

> On Mar 19, 2014, at 16:06, George Shoukry <[email protected]> wrote:
> 
> I have a data set with over 10 million observations and each
> observation is uniquely identified by three variables (say time, firm,
> county). I would like to include fixed effects for the three
> identifying variables, cluster the standard errors at the firm level,
> and run OLS and Poisson regressions for some variables in the data. I
> have two questions:
> 
> 1. Ideally I want to do "reg y x i.firm i.time i.county, vce(cluster
> firm)", but this takes too long (not sure exactly how long because I
> stopped it after a while). So far I've been able to get OLS estimates
> on my computer using the undocumented _regress command with the
> absorb() option. The county identifier has the most number of values,
> so I do something like "_regress y x i.firm i.time, absorb(county)".
> The problem is that I cannot seem to cluster the errors at the firm
> level with the _regress command and I can't find documentation for it.
> Any ideas on the fastest way in Stata to obtain OLS estimates in this
> case with clustered errors?
> Note: I tried some other options but they seem to take too long (how
> long do you leave commands running before you stop them?).
> 
> 2. Any experience with the best way to run a fixed-effects Poisson
> regression with a large dataset and several fixed effects?
> 
> Thanks!
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index