Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Efficient way to predict values from regressions on subsets of the data?


From   "David Radwin" <dradwin@mprinc.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Efficient way to predict values from regressions on subsets of the data?
Date   Fri, 15 Apr 2011 16:06:20 -0700 (PDT)

Apparently -in- is faster than -if-, but perhaps only twice as fast.

See Blasnik's Law in
http://www.stata.com/statalist/archive/2007-09/msg00361.html
and http://www.stata.com/statalist/archive/2007-08/msg00668.html

So this fix probably will not solve your problem.

David
--
David Radwin
Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794

www.mprinc.com


> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> statalist@hsphsun2.harvard.edu] On Behalf Of Daniel.Green@ny.frb.org
> Sent: Friday, April 15, 2011 2:35 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Efficient way to predict values from regressions on subsets
> of the data?
> 
> Hello all,
> 
> I have a project that involves assembling a panel of data in long format
> and running (quantile) regressions for each institution.  My basic
problem
> involves running estimations on subsets of the data and keeping
predicted
> values  from each of the regressions.  I can't use -by:- unless I write
a
> wrapper, but this will be slow anyway because it uses if qualifiers (see
> below).   I have implemented this in both SAS and Stata and my SAS code
is
> about 100 times faster than my best Stata implementation.
> 
> The panel is unbalanced, but to give you an idea the average number of
> time periods is 650 and the number of firms is over a thousand.   For
each
> firm I need to run three regressions, taking predicted values from two
and
> a coefficient from the third, and combining these three items into a new
> variable.  I have been having trouble finding a way to do this
> efficiently.
> 
> One way would be to loop over all firms and use if qualifiers in the
> regressions and predictions.  I have found this to be very slow, using
if
> clauses on such a long dataset is very very slow,  the procedure seems
to
> take around 4 to 40 seconds per firm!
> 
> My code now  is a bit cumbersome but faster, but involves reshaping the
> data into wide format to avoid using if qualifiers.  I split the data
into
> 10 pieces by firm, then reshape each of these 10 pieces into wide
> format.   I am splitting into 10 files because Stata's reshape command
is
> quite slow (25-30 minutes for me) in reshaping my panel from long to
wide,
> but splitting into 10 the reshape only takes a few seconds each.  Then I
> have 2 layers of loops: one over the 10 files and then over the firms
> inside each file, running the estimation and generating new variables
for
> each of the firms results.  This method is much faster, there are no if
> qualifiers because the data is in wide format.  It takes about 0.5-1.2
> seconds to run each firm.  Overall, including the reshaping, this
> procedure takes maybe 20-30 minutes to run.
> 
> Unfortunately for Stata fans (including myself), I was able to get this
> entire thing to run in about 50 seconds in SAS, or about 0.04 seconds
per
> firm!  The trick is that SAS can automatically run quantile regressions
> -by- a panel variable AND output predicted values at the same time.
But,
> I would like to keep everything in Stata if I can.  Does anyone have a
> suggestion on a more efficient method of implementing what I am doing?
> Would using the -in- qualifier instead of -if- be worth it?
> 
> Thanks,
> 
> Daniel
> _______________________________
> Daniel Green
> Research & Statistics Group
> Federal Reserve Bank of New York
> 212-720-6320
> Daniel.Green@ny.frb.org
> 
> 
> 
> 
> 
> This e-mail message, including attachments, is for the sole use of the
> intended recipient(s) and may contain confidential or proprietary
> information.  If you are not the intended recipient, immediately contact
> the sender by reply e-mail and destroy all copies of the original
message.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index