Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Michael Boehm <michael.boehm1@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Re: How to speed up estimation and coefficient collection in large dataset with many parameters? |
Date | Mon, 13 Jan 2014 09:47:16 +0000 |
Dear all, I want to run a regression in a large dataset (7 million observations) with many regressors by groups (ca 800 different ones) and save the results in a variable. In particular, I want to estimate the skill premia in each group, i.e. a log wage regression of the form: xi: regress loghourlywage i.group*skill i.female*i.educ I am interested in the skill premia in the above regression by group (i.e. the interaction term of group*skill) and save them in a new variable skillprembygroup. I then want to collapse the dataset to the group level, e.g. collapse (mean) skillprembygroup, by(group) I am looking for a way to carry this out most efficiently, as with so many observations and regression parameters it all takes a lot of time every time I do it (I do it for many different years). Thus my questions: (1) Is there a way to save the skill premia from the regression directly in one variable that varies for each group instead of first creating 800 different variables for each group? So far I have been running commands along the lines of matrix b = e(b), matrix b = b[801,1], svmat b, names(b1) separately for each regressor and then put everything together into one variable (skillprembygroup)... (2) Is there a way to faster run the wage regression? For example, I am not interested in the intercepts by group and the coefficients from i.female*i.educ. Can I somehow absorb all of those? Thanks in advance for your help. Michael On Mon, Jan 13, 2014 at 9:17 AM, Michael Boehm <michael.boehm1@gmail.com> wrote: > Dear all, > > I want to run a regression in a large dataset (7 million observations) with > many regressors by groups (ca 800 different ones) and save the results in a > variable. In particular, I want to estimate the skill premia in each group, > i.e. a log wage regression of the form: > > xi: regress loghourlywage i.group*skill i.female*i.educ > > I am interested in the skill premia in the above regression by group (i.e. > the interaction term of group*skill) and save them in a new variable > skillprembygroup. I then want to collapse the dataset to the group level, > e.g. > > collapse (mean) skillprembygroup, by(group) > > I am looking for a way to carry this out most efficiently, as with so many > observations and regression parameters it all takes a lot of time every > time I do it (I do it for many different years). Thus my questions: > > (1) Is there a way to save the skill premia from the regression directly in > one variable that varies for each group instead of first creating 800 > different variables for each group? So far I have been running commands > along the lines of "matrix b = e(b), matrix b = b[801,1], svmat b, > names(b1)" separately for each regressor and then put everything together > into one variable (skillprembygroup)... > > (2) Is there a way to faster run the wage regression? For example, I am not > interested in the intercepts by group and the coefficients from > i.female*i.educ. Can I somehow absorb all of those? > > Thanks in advance for your help. > Michael * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/