Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Efficient way to predict values from regressions on subsets of the data?

From   "David Radwin" <>
To   <>
Subject   st: RE: Efficient way to predict values from regressions on subsets of the data?
Date   Fri, 15 Apr 2011 16:06:20 -0700 (PDT)

Apparently -in- is faster than -if-, but perhaps only twice as fast.

See Blasnik's Law in

So this fix probably will not solve your problem.

David Radwin
Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794

> -----Original Message-----
> From: [mailto:owner-
>] On Behalf Of
> Sent: Friday, April 15, 2011 2:35 PM
> To:
> Subject: st: Efficient way to predict values from regressions on subsets
> of the data?
> Hello all,
> I have a project that involves assembling a panel of data in long format
> and running (quantile) regressions for each institution.  My basic
> involves running estimations on subsets of the data and keeping
> values  from each of the regressions.  I can't use -by:- unless I write
> wrapper, but this will be slow anyway because it uses if qualifiers (see
> below).   I have implemented this in both SAS and Stata and my SAS code
> about 100 times faster than my best Stata implementation.
> The panel is unbalanced, but to give you an idea the average number of
> time periods is 650 and the number of firms is over a thousand.   For
> firm I need to run three regressions, taking predicted values from two
> a coefficient from the third, and combining these three items into a new
> variable.  I have been having trouble finding a way to do this
> efficiently.
> One way would be to loop over all firms and use if qualifiers in the
> regressions and predictions.  I have found this to be very slow, using
> clauses on such a long dataset is very very slow,  the procedure seems
> take around 4 to 40 seconds per firm!
> My code now  is a bit cumbersome but faster, but involves reshaping the
> data into wide format to avoid using if qualifiers.  I split the data
> 10 pieces by firm, then reshape each of these 10 pieces into wide
> format.   I am splitting into 10 files because Stata's reshape command
> quite slow (25-30 minutes for me) in reshaping my panel from long to
> but splitting into 10 the reshape only takes a few seconds each.  Then I
> have 2 layers of loops: one over the 10 files and then over the firms
> inside each file, running the estimation and generating new variables
> each of the firms results.  This method is much faster, there are no if
> qualifiers because the data is in wide format.  It takes about 0.5-1.2
> seconds to run each firm.  Overall, including the reshaping, this
> procedure takes maybe 20-30 minutes to run.
> Unfortunately for Stata fans (including myself), I was able to get this
> entire thing to run in about 50 seconds in SAS, or about 0.04 seconds
> firm!  The trick is that SAS can automatically run quantile regressions
> -by- a panel variable AND output predicted values at the same time.
> I would like to keep everything in Stata if I can.  Does anyone have a
> suggestion on a more efficient method of implementing what I am doing?
> Would using the -in- qualifier instead of -if- be worth it?
> Thanks,
> Daniel
> _______________________________
> Daniel Green
> Research & Statistics Group
> Federal Reserve Bank of New York
> 212-720-6320
> This e-mail message, including attachments, is for the sole use of the
> intended recipient(s) and may contain confidential or proprietary
> information.  If you are not the intended recipient, immediately contact
> the sender by reply e-mail and destroy all copies of the original
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index