[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Slow -rolling- regressions on panel data
Nick Cox <firstname.lastname@example.org>
Re: st: Slow -rolling- regressions on panel data
Tue, 27 Sep 2011 15:01:55 +0100
I don't want to dampen Partho's or Richard's enthusiasm for Mata,
which I share, as the public record also shows. But I think this
contrast is too broad-brush to be really helpful. You need to look at
it problem by problem.
Here I guess I am going some way beyond what Partho and Richard said,
but the issues are of some general interest and there is some chance
that some people will jump to the wrong impressions.
For example, -regress- in Stata is a thin veneer of .ado code on top
of an internal command which is compiled C code. Not only is that
pretty fast, it also includes code for handling difficult regressions
that is going to be absent from almost anybody's hand-coded regression
code written in Mata using textbook formulae. There is plenty of scope
for misunderstanding if "using Stata to do regressions" turns out to
mean "using my own Mata code".
The case of -correlate- is similar but not identical, as there are
functions you can use directly in Mata to get correlations.
The context is one of panel problems and rolling estimation to boot.
That clearly implies that anyone taking the Mata route has to set up a
framework of looping over panels and over moving windows, somehow or
other. There are helper functions for that, but it does not qualify as
a trivial problem for most people.
Spending minutes or more likely hours or days to shave seconds off a
program's time is one of the curious habits of many programmers. Of
course, if you are going to use that program again and again, or
programming is a kind of hobby any way, that's fine and I am one of
the people who do that now and then. But any implication that it's
generally best to program in Mata is not going to be accurate or
Also, Stata was not developed to be slow and Mata to be fast. Stata
was developed to be as fast as possible and various wrappers of
interpreted code are then added when that is helpful, protective and
doesn't slow things too much.
Some statistical software seems to be developed on the presumption
that deep down every data analyst would like to be a programmer. All I
can say is that in my experience this does not appear to be generally
true, although every extensible language depends on the oddballs for
whom it is exactly right.
On Tue, Sep 27, 2011 at 1:09 PM, Partho Sarkar
> That's the real point- -regress- is too much firepower for just
> finding a set of correlations. In fact, even -correlate- may be
> rather heavier than necessary. My guess about Mata being likely to be
> faster (certainly more elegant!) is based on the general premise that
> Mata is designed to be (much) faster than Stata for the things it can
> do. You might find "Mata, the missing manual" by William Gould a good
> introduction. Also "Programming in Stata and Mata", by Christopher F
> Baum. (Must say I have not had much occasion to actually use Mata so
> far, but coming from a background in C, R & Matlab, Mata was a
> thrilling find within Stata!)
On Tue, Sep 27, 2011 at 4:29 PM, Richard Herron
>> Thanks, all, for the input!
>> I was able to get a serviceable solution using -correlate- to find beta.
>> The next think I need to learn in Stats is writing my own .ado files
>> and using Mata (when you loop over the existing functions, I think
>> there can be too much overhead).
>> On Tue, Sep 27, 2011 at 03:59, Nick Cox <email@example.com> wrote:
>>> Actually, I would guess that Austin's suggestion will run faster than
>>> this, but we're just trading speculation.
On Tue, Sep 27, 2011 at 7:32 AM, Partho Sarkar
>>>> If all you really want is the autocorrelation coefficient, of course
>>>> you don't really need -regress-, which does much more than just
>>>> generate the regression coefficients. As an alternative to Austin's
>>>> suggestion (and apriori I would expect this to be faster)
>>>> you could also get the AC's via matrix computations in Mata, successively
>>>> passing the y-vector (and the lagged y-vector?) for each firm to Mata
>>>> within a loop, computing the sums, inner products etc., and passing
>>>> the result back to Stata.
>>>> Of course, Nick's point still holds: given your data size, this is
>>>> likely to be time-consuming in any case.
>>>> As a last thought, you are presumably interested in doing this for
>>>> some "real" data- I think you might have an ill-conditioned matrix
>>>> with your artificial example, which would partly account for the slow
Richard Herron <firstname.lastname@example.org>
>>>> I am using -rolling- for rolling regressions on panel data, but it is
>>>> exceedingly slow. I found a Statalist thread
>>>> (http://www.stata.com/statalist/archive/2009-09/msg01239.html) with a
>>>> more manual solution, but it is equally slow (both are too slow to run
>>>> to completion in a reasonable amount of time).
>>>> Is -regress- the bottleneck? I only want the AR(1) coefficient; is
>>>> there a different approach I should take? Are rolling
>>>> regressions/calculations best done in different software?
* For searches and help try: