Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?


From   Phil Schumm <[email protected]>
To   Statalist Statalist <[email protected]>
Subject   Re: st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?
Date   Mon, 13 Jan 2014 04:48:38 -0600

On Jan 13, 2014, at 3:47 AM, Michael Boehm <[email protected]> wrote:
> I want to run a regression in a large dataset (7 million observations) with many regressors by groups (ca 800 different ones) and save the results in a variable.

<snip>

> (1) Is there a way to save the skill premia from the regression directly in one variable that varies for each group instead of first creating 800 different variables for each group?


Use the -statsby- prefix (see [D] statsby).  This will do exactly what you want.


> (2) Is there a way to faster run the wage regression? For example, I am not interested in the intercepts by group and the coefficients from i.female*i.educ. Can I somehow absorb all of those?


I see that you're using the -xi- prefix -- is that because you're using a version of Stata prior to Stata 11?  If not, spend some time learning about factor variables ([U] 11.4.3 Factor variables) which have replaced -xi- and are both more capable and easier to use (and may even be faster).

You may be able to speed things up a little by creating your covariates (including the interaction terms) once up front, rather than re-creating them for each regression using either -xi- or factor variables.  However, I would expect that these (especially the latter) are pretty fast, so you may not save much time with this.

With some thought, you may be able to come up with a clever way of speeding up the regressions across all the groups.  However, unless this is something you are going to be doing repeatedly, it's probably not worth it.  Stata can almost certainly perform the 800 OLS regressions (via -statsby-) faster than you can (1) come up with a strategy, (2) implement it, and (3) check to make sure you didn't make a mistake.


-- Phil


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index