Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Slow -rolling- regressions on panel data
Partho Sarkar <email@example.com>
Re: st: Slow -rolling- regressions on panel data
Tue, 27 Sep 2011 20:06:42 +0530
Touché, Nick! In fact, I had half thought of issuing a partial
retraction soon after shooting off that last post, before I left the
workplace. Just as well that I didn't, as we got an(other) excellent
post from Nick, as full of wit as good sense! (I especially liked the
digs at hobbyist "oddball" programmers- here is a self-confessed
Certainly, I didn't mean to suggest that -regress- in particular, or
Stata built-in functions in general, can be beaten by (user-written)
Mata code. At least, not in Stata 11 or later, as they already
internalise much of the Mata tricks. (But for the problem at hand,
regress was certainly overkill, especially if it is going to be
invoked a few thousand times).
Nonetheless, for some problems, Mata is pretty well worth the extra
effort. In particular, you do NOT have to code everything in terms of
basic matrix algebra. E.g., as Nick says: "The case of -correlate- is
similar but not identical, as there are functions you can use directly
in Mata to get correlations". And so are other Mata functions, I am
Incidentally, this "Stata vs. Mata" issue seems to have been discussed
several times on the list.
On Tue, Sep 27, 2011 at 7:31 PM, Nick Cox <firstname.lastname@example.org> wrote:
> I don't want to dampen Partho's or Richard's enthusiasm for Mata,
> which I share, as the public record also shows. But I think this
> contrast is too broad-brush to be really helpful. You need to look at
> it problem by problem.
> Here I guess I am going some way beyond what Partho and Richard said,
> but the issues are of some general interest and there is some chance
> that some people will jump to the wrong impressions.
> For example, -regress- in Stata is a thin veneer of .ado code on top
> of an internal command which is compiled C code. Not only is that
> pretty fast, it also includes code for handling difficult regressions
> that is going to be absent from almost anybody's hand-coded regression
> code written in Mata using textbook formulae. There is plenty of scope
> for misunderstanding if "using Stata to do regressions" turns out to
> mean "using my own Mata code".
> The case of -correlate- is similar but not identical, as there are
> functions you can use directly in Mata to get correlations.
> The context is one of panel problems and rolling estimation to boot.
> That clearly implies that anyone taking the Mata route has to set up a
> framework of looping over panels and over moving windows, somehow or
> other. There are helper functions for that, but it does not qualify as
> a trivial problem for most people.
> Spending minutes or more likely hours or days to shave seconds off a
> program's time is one of the curious habits of many programmers. Of
> course, if you are going to use that program again and again, or
> programming is a kind of hobby any way, that's fine and I am one of
> the people who do that now and then. But any implication that it's
> generally best to program in Mata is not going to be accurate or
> Also, Stata was not developed to be slow and Mata to be fast. Stata
> was developed to be as fast as possible and various wrappers of
> interpreted code are then added when that is helpful, protective and
> doesn't slow things too much.
> Some statistical software seems to be developed on the presumption
> that deep down every data analyst would like to be a programmer. All I
> can say is that in my experience this does not appear to be generally
> true, although every extensible language depends on the oddballs for
> whom it is exactly right.
> On Tue, Sep 27, 2011 at 1:09 PM, Partho Sarkar
> <email@example.com> wrote:
>> That's the real point- -regress- is too much firepower for just
>> finding a set of correlations. In fact, even -correlate- may be
>> rather heavier than necessary. My guess about Mata being likely to be
>> faster (certainly more elegant!) is based on the general premise that
>> Mata is designed to be (much) faster than Stata for the things it can
>> do. You might find "Mata, the missing manual" by William Gould a good
>> introduction. Also "Programming in Stata and Mata", by Christopher F
>> Baum. (Must say I have not had much occasion to actually use Mata so
>> far, but coming from a background in C, R & Matlab, Mata was a
>> thrilling find within Stata!)
> On Tue, Sep 27, 2011 at 4:29 PM, Richard Herron
>>> Thanks, all, for the input!
>>> I was able to get a serviceable solution using -correlate- to find beta.
>>> The next think I need to learn in Stats is writing my own .ado files
>>> and using Mata (when you loop over the existing functions, I think
>>> there can be too much overhead).
>>> On Tue, Sep 27, 2011 at 03:59, Nick Cox <firstname.lastname@example.org> wrote:
>>>> Actually, I would guess that Austin's suggestion will run faster than
>>>> this, but we're just trading speculation.
> On Tue, Sep 27, 2011 at 7:32 AM, Partho Sarkar
>>>>> If all you really want is the autocorrelation coefficient, of course
>>>>> you don't really need -regress-, which does much more than just
>>>>> generate the regression coefficients. As an alternative to Austin's
>>>>> suggestion (and apriori I would expect this to be faster)
>>>>> you could also get the AC's via matrix computations in Mata, successively
>>>>> passing the y-vector (and the lagged y-vector?) for each firm to Mata
>>>>> within a loop, computing the sums, inner products etc., and passing
>>>>> the result back to Stata.
>>>>> Of course, Nick's point still holds: given your data size, this is
>>>>> likely to be time-consuming in any case.
>>>>> As a last thought, you are presumably interested in doing this for
>>>>> some "real" data- I think you might have an ill-conditioned matrix
>>>>> with your artificial example, which would partly account for the slow
> Richard Herron <email@example.com>
>>>>> I am using -rolling- for rolling regressions on panel data, but it is
>>>>> exceedingly slow. I found a Statalist thread
>>>>> (http://www.stata.com/statalist/archive/2009-09/msg01239.html) with a
>>>>> more manual solution, but it is equally slow (both are too slow to run
>>>>> to completion in a reasonable amount of time).
>>>>> Is -regress- the bottleneck? I only want the AR(1) coefficient; is
>>>>> there a different approach I should take? Are rolling
>>>>> regressions/calculations best done in different software?
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: