Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?


From   Michael Boehm <[email protected]>
To   [email protected]
Subject   st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters?
Date   Mon, 13 Jan 2014 09:47:16 +0000

Dear all,

I want to run a regression in a large dataset (7 million observations)
with many regressors by groups (ca 800 different ones) and save the
results in a variable. In particular, I want to estimate the skill
premia in each group, i.e. a log wage regression of the form:

xi: regress loghourlywage i.group*skill i.female*i.educ

I am interested in the skill premia in the above regression by group
(i.e. the interaction term of group*skill) and save them in a new
variable skillprembygroup. I then want to collapse the dataset to the
group level, e.g.

collapse (mean) skillprembygroup, by(group)

I am looking for a way to carry this out most efficiently, as with so
many observations and regression parameters it all takes a lot of time
every time I do it (I do it for many different years). Thus my
questions:

(1) Is there a way to save the skill premia from the regression
directly in one variable that varies for each group instead of first
creating 800 different variables for each group? So far I have been
running commands along the lines of matrix b = e(b), matrix b =
b[801,1], svmat b, names(b1) separately for each regressor and then
put everything together into one variable (skillprembygroup)...

(2) Is there a way to faster run the wage regression? For example, I
am not interested in the intercepts by group and the coefficients from
i.female*i.educ. Can I somehow absorb all of those?

Thanks in advance for your help.
Michael

On Mon, Jan 13, 2014 at 9:17 AM, Michael Boehm <[email protected]> wrote:
> Dear all,
>
> I want to run a regression in a large dataset (7 million observations) with
> many regressors by groups (ca 800 different ones) and save the results in a
> variable. In particular, I want to estimate the skill premia in each group,
> i.e. a log wage regression of the form:
>
> xi: regress loghourlywage i.group*skill i.female*i.educ
>
> I am interested in the skill premia in the above regression by group (i.e.
> the interaction term of group*skill) and save them in a new variable
> skillprembygroup. I then want to collapse the dataset to the group level,
> e.g.
>
> collapse (mean) skillprembygroup, by(group)
>
> I am looking for a way to carry this out most efficiently, as with so many
> observations and regression parameters it all takes a lot of time every
> time I do it (I do it for many different years). Thus my questions:
>
> (1) Is there a way to save the skill premia from the regression directly in
> one variable that varies for each group instead of first creating 800
> different variables for each group? So far I have been running commands
> along the lines of "matrix b = e(b), matrix b = b[801,1], svmat b,
> names(b1)" separately for each regressor and then put everything together
> into one variable (skillprembygroup)...
>
> (2) Is there a way to faster run the wage regression? For example, I am not
> interested in the intercepts by group and the coefficients from
> i.female*i.educ. Can I somehow absorb all of those?
>
> Thanks in advance for your help.
> Michael
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index