Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: how can i make my loop run faster?


From   Stefano Rossi <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: how can i make my loop run faster?
Date   Tue, 20 Sep 2011 00:50:45 -0400

Dear Partho,

many thanks for this, which is very useful.  I can see how "rolling" works, and I can see how it can generate efficiency gains, but I agree the whole procedure may still be quite slow and require splitting the sample into subsets to get a faster procedure in some way.   

I am currently considering a different path, namely generating a cross-section of observations by firm-period, whereby each firm-period unit contains 12 observations, from -1 to -12 (I would also have a separate data by +1 to +12).  This procedure would effectively produce a dataset 12 times larger than my current one.  This procedure would get around the "rolling" issue, and would allow me to use the "statsby" (or equivalent) command without worrying of the length of the estimation sample, with potentially large efficiency improvements (i.e., no "ifs").

Provided my intuition is correct, my one concern here is how to create such dataset, which is 12 times bigger than the current one.  Is there a built-in Stata command that allows to do this efficiently?

Many thanks for your support.

Kind regards,

Stefano


________________________________________
From: [email protected] [[email protected]] On Behalf Of Partho Sarkar [[email protected]]
Sent: Tuesday, September 20, 2011 12:33 AM
To: [email protected]
Subject: Re: st: how can i make my loop run faster?

I guess Stefano might have solved his problem by now, but just to
complete this, here is a post by  Brian R. Landy from an older thread
which gives the complete code for -rolling-, including merging the
results files.

http://www.stata.com/statalist/archive/2009-09/msg01239.html

The thread also points out the speed problems with rolling for panel data.

P.Sarkar

On Mon, Sep 19, 2011 at 10:18 PM, Partho Sarkar
<[email protected]> wrote:
> Sorry, I made a mistake in that post. -rolling- will only work on one
> panel at a time, So you could do :
>
> levelsof firm==`z', local(firms)
> foreach j of local firms {
> rolling _b if firm=`j',w(20)  saving(tryroll`j'): regress y x
> }
>
> Partho
>
> On Mon, Sep 19, 2011 at 10:02 PM, Partho Sarkar
> <[email protected]> wrote:
>> I think the  -rolling- time series command can help do this.  E.g.
>> once you a) tsset the panel as before, and b) sort the dataset by
>> -sort panelvar datevar-
>>
>> rolling _b,w(20)  saving(tryroll): regress y x
>>
>> would divide up your entire time span into overlapping windows of
>> width 20, run a regression for each panel in each window, and save the
>> panel ids, the start & end of each window, and the regression
>> coefficients, in a Stata data file called "tryroll".
>>
>> See -help rolling- and the manual entry for details & examples.  Given
>> your special requirements, you will probably have to do this in 2 or
>> more steps, and manipulate the results further to get exactly what you
>> want.
>>
>> Partho
>>
>> On Mon, Sep 19, 2011 at 8:20 PM, Stefano Rossi <[email protected]> wrote:
>>> Partho,
>>>
>>> Many thanks for this, it is very helpful.
>>>
>>> This raises one question, though: a crucial part of my procedure is that I need to run regressions only on 12 observations for each firm-period pair; that is, if a firm i has data back to period t=-50, say, I still have to run the regression only on the 12 observations from -1 to -12, ignoring all others.  This worked well with my loop, but I do not see readily how to do this with statsby.  Can you please advise?
>>>
>>> Best,
>>>
>>> Stefano
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of Partho Sarkar
>>> Sent: Monday, September 19, 2011 1:06 AM
>>> To: [email protected]
>>> Subject: Re: st: how can i make my loop run faster?
>>>
>>> Stefano
>>>
>>> You don't seem to be actually making any use of the panel structure of
>>> the data.  Stata has very neat built-in procedures for dealing with
>>> such data.
>>>
>>> Very briefly, 2 pointers (I am ignoring the special wrinkle in your
>>> problem that you want to run 20 seoarate regressions for each "firm
>>> i-period t" pair- you would have to adapt the procedure accordingly):
>>>
>>> A.  I would use -tsfill, full- to fill in the time values and balance the panel.
>>>
>>> B. If you use tsset panelvar datavar (or xtset), where panelvar is
>>> your panel identifier, and datevar the date variable, you can use:
>>>
>>> statsby _b _se, by(panelvar): regress y x
>>>
>>> to do all the regressions in one go (assuming a single regression for
>>> each "firm i-period t" pair), rather than separately within a long
>>> loop.   You can collect the results saved in r-class macros, as with
>>> _b & _se above.  See -help statsby-
>>>
>>> Having said all that, I have never tried to run a set of regressions
>>> with 30,000 firms & 200 time periods in a single run of a program!!!
>>> I suspect this will be painfully slow no matter how efficient your
>>> code. An obvious alternative would be to split the firms into, say, 10
>>> subsets, do the regression for each subset, and put all the results
>>> together.
>>>
>>> Hope this helps
>>>
>>> Partho Sarkar
>>> Consultant Econometrician
>>> Indicus Analytics
>>> New Delhi, India
>>>
>>>
>>> On Mon, Sep 19, 2011 at 5:22 AM, Stefano Rossi <[email protected]> wrote:
>>>> Dear Statalist Users,
>>>>
>>>> I wonder if you can help me make a faster loop?
>>>> I have an unbalanced panel of about 30,000 firms and 200 periods, and for each "firm i-period t" pair I need to run 10 regressions on the 12 observations from t-1 to t-12 of the same firm i, and another 10 regressions on the observations from t+1 to t+12 of the same firm i.  I have come up with the following program, which works well as it does what it should do, but it is very slow (due to the many ifs I suspect) - here's a simplified version of it with just two regressions:
>>>>
>>>> forval z = 1/30000 {
>>>> levelsof period if firm==`z', local(sample)
>>>> foreach j of local sample {
>>>>       local k = `j' - 13
>>>>       capture reg y x if firm ==`z' & period<`j' & period>`k' & indicator==1
>>>>       if _rc==0 {
>>>>       predict y_hat, xb
>>>>       replace before = y_hat[_n-1] if firm == `z' & period == `j'
>>>>       drop y_hat
>>>>       }
>>>>       local w = `j' + 13
>>>>       capture reg y x if firm ==`z' & period>`j' & period<`w' & indicator==1
>>>>       if _rc==0 {
>>>>       predict y_hat, xb
>>>> replace after = y_hat[_n+1] if firm == `z' & period == `j'
>>>>       drop y_hat
>>>>       }
>>>>       }
>>>> }
>>>>
>>>> Right now, it takes several minutes for each firm, so if I run it for the whole sample it would take weeks.
>>>> Is there any way to make it (a lot) faster?
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index