Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Efficient way to predict values from regressions on subsets of the data?

 From Michael Hecker <[email protected]> To [email protected] Subject Re: st: Efficient way to predict values from regressions on subsets of the data? Date Fri, 15 Apr 2011 23:56:00 +0200

```Hi Daniel,

local StatsBy_str = "year SIC_Industry_Name"
```
statsby "regress XcT_Bo Po_Bo, robust" e(N) _b adj_R2=e(r2_a) tValue_GAMMA1=(_b[Po_Bo]/_se[Po_Bo]) tValue_GAMMA0=(_b[_cons]/_se[_cons]) df_r=e(df_r), by(`StatsBy_str') clear
```
Afterwards you will have all your regression parameters in a new dataset.

Kind Regards,
Michael

University of Mannheim

Am 15.04.2011 23:35, schrieb [email protected]:
```
```Hello all,

I have a project that involves assembling a panel of data in long format
and running (quantile) regressions for each institution.  My basic problem
involves running estimations on subsets of the data and keeping predicted
values  from each of the regressions.  I can't use -by:- unless I write a
wrapper, but this will be slow anyway because it uses if qualifiers (see
below).   I have implemented this in both SAS and Stata and my SAS code is
about 100 times faster than my best Stata implementation.

The panel is unbalanced, but to give you an idea the average number of
time periods is 650 and the number of firms is over a thousand.   For each
firm I need to run three regressions, taking predicted values from two and
a coefficient from the third, and combining these three items into a new
variable.  I have been having trouble finding a way to do this
efficiently.

One way would be to loop over all firms and use if qualifiers in the
regressions and predictions.  I have found this to be very slow, using if
clauses on such a long dataset is very very slow,  the procedure seems to
take around 4 to 40 seconds per firm!

My code now  is a bit cumbersome but faster, but involves reshaping the
data into wide format to avoid using if qualifiers.  I split the data into
10 pieces by firm, then reshape each of these 10 pieces into wide
format.   I am splitting into 10 files because Stata's reshape command is
quite slow (25-30 minutes for me) in reshaping my panel from long to wide,
but splitting into 10 the reshape only takes a few seconds each.  Then I
have 2 layers of loops: one over the 10 files and then over the firms
inside each file, running the estimation and generating new variables for
each of the firms results.  This method is much faster, there are no if
qualifiers because the data is in wide format.  It takes about 0.5-1.2
seconds to run each firm.  Overall, including the reshaping, this
procedure takes maybe 20-30 minutes to run.

Unfortunately for Stata fans (including myself), I was able to get this
entire thing to run in about 50 seconds in SAS, or about 0.04 seconds per
firm!  The trick is that SAS can automatically run quantile regressions
-by- a panel variable AND output predicted values at the same time.  But,
I would like to keep everything in Stata if I can.  Does anyone have a
suggestion on a more efficient method of implementing what I am doing?
Would using the -in- qualifier instead of -if- be worth it?

Thanks,

Daniel
_______________________________
Daniel Green
Research&  Statistics Group
Federal Reserve Bank of New York
212-720-6320
[email protected]

This e-mail message, including attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information.  If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

• References: