Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?

From	Michael Boehm <[email protected]>
To	[email protected]
Subject	Re: st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?
Date	Mon, 13 Jan 2014 14:23:53 +0000

Dear Phil,

Thanks so much. The statsby command is superconvenient and it
substantially increases the speed of the whole program.

Michael

On Mon, Jan 13, 2014 at 10:48 AM, Phil Schumm <[email protected]> wrote:
> On Jan 13, 2014, at 3:47 AM, Michael Boehm <[email protected]> wrote:
>> I want to run a regression in a large dataset (7 million observations) with many regressors by groups (ca 800 different ones) and save the results in a variable.
>
> <snip>
>
>> (1) Is there a way to save the skill premia from the regression directly in one variable that varies for each group instead of first creating 800 different variables for each group?
>
>
> Use the -statsby- prefix (see [D] statsby).  This will do exactly what you want.
>
>
>> (2) Is there a way to faster run the wage regression? For example, I am not interested in the intercepts by group and the coefficients from i.female*i.educ. Can I somehow absorb all of those?
>
>
> I see that you're using the -xi- prefix -- is that because you're using a version of Stata prior to Stata 11?  If not, spend some time learning about factor variables ([U] 11.4.3 Factor variables) which have replaced -xi- and are both more capable and easier to use (and may even be faster).
>
> You may be able to speed things up a little by creating your covariates (including the interaction terms) once up front, rather than re-creating them for each regression using either -xi- or factor variables.  However, I would expect that these (especially the latter) are pretty fast, so you may not save much time with this.
>
> With some thought, you may be able to come up with a clever way of speeding up the regressions across all the groups.  However, unless this is something you are going to be doing repeatedly, it's probably not worth it.  Stata can almost certainly perform the 800 OLS regressions (via -statsby-) faster than you can (1) come up with a strategy, (2) implement it, and (3) check to make sure you didn't make a mistake.
>
>
> -- Phil
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?
  - From: Michael Boehm <[email protected]>
- Re: st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?
  - From: Phil Schumm <[email protected]>

Prev by Date: Re: st: Puzzling error with merge
Next by Date: st: GMM criterion function
Previous by thread: Re: st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?
Next by thread: st: Hausman test for endogeneity for HT panel estimator
Index(es):
- Date
- Thread