Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Slow -rolling- regressions on panel data

From	Richard Herron <[email protected]>
To	[email protected]
Subject	Re: st: Slow -rolling- regressions on panel data
Date	Tue, 27 Sep 2011 19:45:16 -0400

Nick and Partho, thanks for the discussion!

I certainly don't intend to recreate -regress- in Mata (I am coming
from R -- at the prodding of an advisor -- and love that in Stata I
can focus on the analysis and results, not the coding), but it's nice
to know that I can use Mata to get a sharp knife to combine with
-rolling- and shave some time off these calculations.

On Tue, Sep 27, 2011 at 11:50, Nick Cox <[email protected]> wrote:
> Thanks for your good-humoured response.
>
> I should stress that "oddball", like "pedant", is not for me a
> pejorative word and that I included myself.
>
> A key point you make here is that Mata is about much more than matrix
> algebra. Recently, a thread started by Adam Ozimek raised the awkward
> question of handling strings longer than 244 characters, which can't
> be fit into string variables. Not only are variables out of the
> question, this is difficult even to work with using locals longer than
> 244 characters. With Mata it starts looking like a Mickey Mouse
> problem:
>
> http://www.stata.com/statalist/archive/2011-09/msg01001.html
> http://www.stata.com/statalist/archive/2011-09/msg01033.html
>
> Nick
>
> On Tue, Sep 27, 2011 at 3:36 PM, Partho Sarkar
> <[email protected]> wrote:
>> Touché, Nick!  In fact, I had half thought of issuing a partial
>> retraction soon after shooting off that last post, before I left the
>> workplace.  Just as well that I didn't, as we got an(other) excellent
>> post from Nick, as full of wit as good sense!  (I especially liked the
>> digs at hobbyist "oddball" programmers- here is a self-confessed
>> case!)
>>
>> Certainly, I didn't mean to suggest that -regress- in particular, or
>> Stata built-in functions in general, can be beaten by (user-written)
>> Mata code.  At least, not in Stata 11 or later, as they already
>> internalise much of the Mata tricks.  (But for the problem at hand,
>> regress was certainly overkill, especially if it is going to be
>> invoked a few thousand times).
>>
>> Nonetheless, for some  problems, Mata is pretty well worth the extra
>> effort.  In particular, you do NOT have to code everything in terms of
>> basic matrix algebra.  E.g., as Nick says: "The case of -correlate- is
>> similar but not identical, as there are functions you can use directly
>> in Mata to get correlations".  And so are other Mata functions, I am
>> sure.
>>
>> Incidentally, this "Stata vs. Mata" issue seems to have been discussed
>> several times on the list.
>>
>> Best
>> Partho
>>
>>
>> On Tue, Sep 27, 2011 at 7:31 PM, Nick Cox <[email protected]> wrote:
>>> I don't want to dampen Partho's or Richard's enthusiasm for Mata,
>>> which I share, as the public record also shows. But I think this
>>> contrast is too broad-brush to be really helpful. You need to look at
>>> it problem by problem.
>>>
>>> Here I guess I am going some way beyond what Partho and Richard said,
>>> but the issues are of some general interest and there is some chance
>>> that some people will jump to the wrong impressions.
>>>
>>> For example, -regress- in Stata is a thin veneer of .ado code on top
>>> of an internal command which is compiled C code. Not only is that
>>> pretty fast, it also includes code for handling difficult regressions
>>> that is going to be absent from almost anybody's hand-coded regression
>>> code written in Mata using textbook formulae. There is plenty of scope
>>> for misunderstanding if "using Stata to do regressions" turns out to
>>> mean "using my own Mata code".
>>>
>>> The case of -correlate- is similar but not identical, as there are
>>> functions you can use directly in Mata to get correlations.
>>>
>>> The context is one of panel problems and rolling estimation to boot.
>>> That clearly implies that anyone taking the Mata route has to set up a
>>> framework of looping over panels and over moving windows, somehow or
>>> other. There are helper functions for that, but it does not qualify as
>>> a trivial problem for most people.
>>>
>>> Spending minutes or more likely hours or days to shave seconds off a
>>> program's time is one of the curious habits of many programmers. Of
>>> course, if you are going to use that program again and again, or
>>> programming is a kind of hobby any way, that's fine and I am one of
>>> the people who do that now and then. But any implication that it's
>>> generally best to program in Mata is not going to be accurate or
>>> realistic.
>>>
>>> Also, Stata was not developed to be slow and Mata to be fast. Stata
>>> was developed to be as fast as possible and various wrappers of
>>> interpreted code are then added when that is helpful, protective and
>>> doesn't slow things too much.
>>>
>>> Some statistical software seems to be developed on the presumption
>>> that deep down every data analyst would like to be a programmer. All I
>>> can say is that in my experience this does not appear to be generally
>>> true, although every extensible language depends on the oddballs for
>>> whom it is exactly right.
>>>
>>> On Tue, Sep 27, 2011 at 1:09 PM, Partho Sarkar
>>> <[email protected]> wrote:
>>>> That's the real point- -regress- is too much firepower for just
>>>> finding a set of correlations.  In fact, even -correlate- may be
>>>> rather heavier than necessary.  My guess about Mata being likely to be
>>>> faster (certainly more elegant!) is based on the general premise that
>>>> Mata is designed to be (much) faster than Stata for the things it can
>>>> do.  You might find "Mata, the missing manual" by William Gould a good
>>>> introduction.  Also "Programming in Stata and Mata", by Christopher F
>>>> Baum.  (Must say I have not had much occasion to actually use Mata so
>>>> far, but coming from a background in C, R & Matlab, Mata was a
>>>> thrilling find within Stata!)
>>>
>>>  On Tue, Sep 27, 2011 at 4:29 PM, Richard Herron
>>>
>>>>> Thanks, all, for the input!
>>>>>
>>>>> I was able to get a serviceable solution using -correlate- to find beta.
>>>>>
>>>>> The next think I need to learn in Stats is writing my own .ado files
>>>>> and using Mata (when you loop over the existing functions, I think
>>>>> there can be too much overhead).
>>>>>
>>>>> On Tue, Sep 27, 2011 at 03:59, Nick Cox <[email protected]> wrote:
>>>
>>>>>> Actually, I would guess that Austin's suggestion will run faster than
>>>>>> this, but we're just trading speculation.
>>>
>>> On Tue, Sep 27, 2011 at 7:32 AM, Partho Sarkar
>>>
>>>>>>> If all you really want is the autocorrelation coefficient, of course
>>>>>>> you don't really need -regress-, which does much more than just
>>>>>>> generate the regression coefficients.  As an alternative to Austin's
>>>>>>> suggestion (and apriori I would expect this to be faster)
>>>>>>> you could also get the AC's via matrix computations in Mata, successively
>>>>>>> passing the  y-vector (and the lagged y-vector?) for each firm to Mata
>>>>>>> within a loop, computing the sums, inner products etc., and passing
>>>>>>> the result back to Stata.
>>>>>>>
>>>>>>> Of course, Nick's point still holds: given your data size, this is
>>>>>>> likely to be time-consuming in any case.
>>>>>>>
>>>>>>> As a last thought, you are presumably interested in doing this for
>>>>>>> some "real" data- I think you might have an ill-conditioned matrix
>>>>>>> with your artificial example, which would partly account for the slow
>>>>>>> regressions.
>>>
>>> Richard Herron <[email protected]>
>>>
>>>>>>> I am using -rolling- for rolling regressions on panel data, but it is
>>>>>>> exceedingly slow. I found a Statalist thread
>>>>>>> (http://www.stata.com/statalist/archive/2009-09/msg01239.html) with a
>>>>>>> more manual solution, but it is equally slow (both are too slow to run
>>>>>>> to completion in a reasonable amount of time).
>>>>>>>
>>>>>>> Is -regress- the bottleneck? I only want the AR(1) coefficient; is
>>>>>>> there a different approach I should take? Are rolling
>>>>>>> regressions/calculations best done in different software?
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Slow -rolling- regressions on panel data
  - From: Partho Sarkar <[email protected]>
- Re: st: Slow -rolling- regressions on panel data
  - From: Nick Cox <[email protected]>
- Re: st: Slow -rolling- regressions on panel data
  - From: Richard Herron <[email protected]>
- Re: st: Slow -rolling- regressions on panel data
  - From: Partho Sarkar <[email protected]>
- Re: st: Slow -rolling- regressions on panel data
  - From: Nick Cox <[email protected]>
- Re: st: Slow -rolling- regressions on panel data
  - From: Partho Sarkar <[email protected]>
- Re: st: Slow -rolling- regressions on panel data
  - From: Nick Cox <[email protected]>

Prev by Date: st: Twoway options with grqreg
Next by Date: Re: st: STATA and HLM Model Building
Previous by thread: Re: st: Slow -rolling- regressions on panel data
Next by thread: st: Regional dummies in cross section treatment effects
Index(es):
- Date
- Thread