Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Slow -rolling- regressions on panel data

From	Richard Herron <[email protected]>
To	[email protected]
Subject	Re: st: Slow -rolling- regressions on panel data
Date	Tue, 4 Oct 2011 23:09:04 -0400
For anyone following this thread in real-time or years later, I
learned enough Stata/Mata to code and compile a simple OLS in Mata and
found no speed improvement (in fact, a slight worsening).

So it appears the slow-down was the scale of the problem.

On Tue, Sep 27, 2011 at 19:45, Richard Herron
<[email protected]> wrote:
> Nick and Partho, thanks for the discussion!
>
> I certainly don't intend to recreate -regress- in Mata (I am coming
> from R -- at the prodding of an advisor -- and love that in Stata I
> can focus on the analysis and results, not the coding), but it's nice
> to know that I can use Mata to get a sharp knife to combine with
> -rolling- and shave some time off these calculations.
>
> On Tue, Sep 27, 2011 at 11:50, Nick Cox <[email protected]> wrote:
>> Thanks for your good-humoured response.
>>
>> I should stress that "oddball", like "pedant", is not for me a
>> pejorative word and that I included myself.
>>
>> A key point you make here is that Mata is about much more than matrix
>> algebra. Recently, a thread started by Adam Ozimek raised the awkward
>> question of handling strings longer than 244 characters, which can't
>> be fit into string variables. Not only are variables out of the
>> question, this is difficult even to work with using locals longer than
>> 244 characters. With Mata it starts looking like a Mickey Mouse
>> problem:
>>
>> http://www.stata.com/statalist/archive/2011-09/msg01001.html
>> http://www.stata.com/statalist/archive/2011-09/msg01033.html
>>
>> Nick
>>
>> On Tue, Sep 27, 2011 at 3:36 PM, Partho Sarkar
>> <[email protected]> wrote:
>>> Touché, Nick!  In fact, I had half thought of issuing a partial
>>> retraction soon after shooting off that last post, before I left the
>>> workplace.  Just as well that I didn't, as we got an(other) excellent
>>> post from Nick, as full of wit as good sense!  (I especially liked the
>>> digs at hobbyist "oddball" programmers- here is a self-confessed
>>> case!)
>>>
>>> Certainly, I didn't mean to suggest that -regress- in particular, or
>>> Stata built-in functions in general, can be beaten by (user-written)
>>> Mata code.  At least, not in Stata 11 or later, as they already
>>> internalise much of the Mata tricks.  (But for the problem at hand,
>>> regress was certainly overkill, especially if it is going to be
>>> invoked a few thousand times).
>>>
>>> Nonetheless, for some  problems, Mata is pretty well worth the extra
>>> effort.  In particular, you do NOT have to code everything in terms of
>>> basic matrix algebra.  E.g., as Nick says: "The case of -correlate- is
>>> similar but not identical, as there are functions you can use directly
>>> in Mata to get correlations".  And so are other Mata functions, I am
>>> sure.
>>>
>>> Incidentally, this "Stata vs. Mata" issue seems to have been discussed
>>> several times on the list.
>>>
>>> Best
>>> Partho
>>>
>>>
>>> On Tue, Sep 27, 2011 at 7:31 PM, Nick Cox <[email protected]> wrote:
>>>> I don't want to dampen Partho's or Richard's enthusiasm for Mata,
>>>> which I share, as the public record also shows. But I think this
>>>> contrast is too broad-brush to be really helpful. You need to look at
>>>> it problem by problem.
>>>>
>>>> Here I guess I am going some way beyond what Partho and Richard said,
>>>> but the issues are of some general interest and there is some chance
>>>> that some people will jump to the wrong impressions.
>>>>
>>>> For example, -regress- in Stata is a thin veneer of .ado code on top
>>>> of an internal command which is compiled C code. Not only is that
>>>> pretty fast, it also includes code for handling difficult regressions
>>>> that is going to be absent from almost anybody's hand-coded regression
>>>> code written in Mata using textbook formulae. There is plenty of scope
>>>> for misunderstanding if "using Stata to do regressions" turns out to
>>>> mean "using my own Mata code".
>>>>
>>>> The case of -correlate- is similar but not identical, as there are
>>>> functions you can use directly in Mata to get correlations.
>>>>
>>>> The context is one of panel problems and rolling estimation to boot.
>>>> That clearly implies that anyone taking the Mata route has to set up a
>>>> framework of looping over panels and over moving windows, somehow or
>>>> other. There are helper functions for that, but it does not qualify as
>>>> a trivial problem for most people.
>>>>
>>>> Spending minutes or more likely hours or days to shave seconds off a
>>>> program's time is one of the curious habits of many programmers. Of
>>>> course, if you are going to use that program again and again, or
>>>> programming is a kind of hobby any way, that's fine and I am one of
>>>> the people who do that now and then. But any implication that it's
>>>> generally best to program in Mata is not going to be accurate or
>>>> realistic.
>>>>
>>>> Also, Stata was not developed to be slow and Mata to be fast. Stata
>>>> was developed to be as fast as possible and various wrappers of
>>>> interpreted code are then added when that is helpful, protective and
>>>> doesn't slow things too much.
>>>>
>>>> Some statistical software seems to be developed on the presumption
>>>> that deep down every data analyst would like to be a programmer. All I
>>>> can say is that in my experience this does not appear to be generally
>>>> true, although every extensible language depends on the oddballs for
>>>> whom it is exactly right.
>>>>
>>>> On Tue, Sep 27, 2011 at 1:09 PM, Partho Sarkar
>>>> <[email protected]> wrote:
>>>>> That's the real point- -regress- is too much firepower for just
>>>>> finding a set of correlations.  In fact, even -correlate- may be
>>>>> rather heavier than necessary.  My guess about Mata being likely to be
>>>>> faster (certainly more elegant!) is based on the general premise that
>>>>> Mata is designed to be (much) faster than Stata for the things it can
>>>>> do.  You might find "Mata, the missing manual" by William Gould a good
>>>>> introduction.  Also "Programming in Stata and Mata", by Christopher F
>>>>> Baum.  (Must say I have not had much occasion to actually use Mata so
>>>>> far, but coming from a background in C, R & Matlab, Mata was a
>>>>> thrilling find within Stata!)
>>>>
>>>>  On Tue, Sep 27, 2011 at 4:29 PM, Richard Herron
>>>>
>>>>>> Thanks, all, for the input!
>>>>>>
>>>>>> I was able to get a serviceable solution using -correlate- to find beta.
>>>>>>
>>>>>> The next think I need to learn in Stats is writing my own .ado files
>>>>>> and using Mata (when you loop over the existing functions, I think
>>>>>> there can be too much overhead).
>>>>>>
>>>>>> On Tue, Sep 27, 2011 at 03:59, Nick Cox <[email protected]> wrote:
>>>>
>>>>>>> Actually, I would guess that Austin's suggestion will run faster than
>>>>>>> this, but we're just trading speculation.
>>>>
>>>> On Tue, Sep 27, 2011 at 7:32 AM, Partho Sarkar
>>>>
>>>>>>>> If all you really want is the autocorrelation coefficient, of course
>>>>>>>> you don't really need -regress-, which does much more than just
>>>>>>>> generate the regression coefficients.  As an alternative to Austin's
>>>>>>>> suggestion (and apriori I would expect this to be faster)
>>>>>>>> you could also get the AC's via matrix computations in Mata, successively
>>>>>>>> passing the  y-vector (and the lagged y-vector?) for each firm to Mata
>>>>>>>> within a loop, computing the sums, inner products etc., and passing
>>>>>>>> the result back to Stata.
>>>>>>>>
>>>>>>>> Of course, Nick's point still holds: given your data size, this is
>>>>>>>> likely to be time-consuming in any case.
>>>>>>>>
>>>>>>>> As a last thought, you are presumably interested in doing this for
>>>>>>>> some "real" data- I think you might have an ill-conditioned matrix
>>>>>>>> with your artificial example, which would partly account for the slow
>>>>>>>> regressions.
>>>>
>>>> Richard Herron <[email protected]>
>>>>
>>>>>>>> I am using -rolling- for rolling regressions on panel data, but it is
>>>>>>>> exceedingly slow. I found a Statalist thread
>>>>>>>> (http://www.stata.com/statalist/archive/2009-09/msg01239.html) with a
>>>>>>>> more manual solution, but it is equally slow (both are too slow to run
>>>>>>>> to completion in a reasonable amount of time).
>>>>>>>>
>>>>>>>> Is -regress- the bottleneck? I only want the AR(1) coefficient; is
>>>>>>>> there a different approach I should take? Are rolling
>>>>>>>> regressions/calculations best done in different software?
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: re: st: rmanova or anova with repeated command, what to use?
Next by Date: st: List of independent variables after estimation
Previous by thread: st: biprobit gof test, partial observability, survey data (svy)
Next by thread: st: List of independent variables after estimation
Index(es):
- Date
- Thread