Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Slow -rolling- regressions on panel data |

Date |
Tue, 27 Sep 2011 15:01:55 +0100 |

I don't want to dampen Partho's or Richard's enthusiasm for Mata, which I share, as the public record also shows. But I think this contrast is too broad-brush to be really helpful. You need to look at it problem by problem. Here I guess I am going some way beyond what Partho and Richard said, but the issues are of some general interest and there is some chance that some people will jump to the wrong impressions. For example, -regress- in Stata is a thin veneer of .ado code on top of an internal command which is compiled C code. Not only is that pretty fast, it also includes code for handling difficult regressions that is going to be absent from almost anybody's hand-coded regression code written in Mata using textbook formulae. There is plenty of scope for misunderstanding if "using Stata to do regressions" turns out to mean "using my own Mata code". The case of -correlate- is similar but not identical, as there are functions you can use directly in Mata to get correlations. The context is one of panel problems and rolling estimation to boot. That clearly implies that anyone taking the Mata route has to set up a framework of looping over panels and over moving windows, somehow or other. There are helper functions for that, but it does not qualify as a trivial problem for most people. Spending minutes or more likely hours or days to shave seconds off a program's time is one of the curious habits of many programmers. Of course, if you are going to use that program again and again, or programming is a kind of hobby any way, that's fine and I am one of the people who do that now and then. But any implication that it's generally best to program in Mata is not going to be accurate or realistic. Also, Stata was not developed to be slow and Mata to be fast. Stata was developed to be as fast as possible and various wrappers of interpreted code are then added when that is helpful, protective and doesn't slow things too much. Some statistical software seems to be developed on the presumption that deep down every data analyst would like to be a programmer. All I can say is that in my experience this does not appear to be generally true, although every extensible language depends on the oddballs for whom it is exactly right. On Tue, Sep 27, 2011 at 1:09 PM, Partho Sarkar <partho.ss+lists@gmail.com> wrote: > That's the real point- -regress- is too much firepower for just > finding a set of correlations. In fact, even -correlate- may be > rather heavier than necessary. My guess about Mata being likely to be > faster (certainly more elegant!) is based on the general premise that > Mata is designed to be (much) faster than Stata for the things it can > do. You might find "Mata, the missing manual" by William Gould a good > introduction. Also "Programming in Stata and Mata", by Christopher F > Baum. (Must say I have not had much occasion to actually use Mata so > far, but coming from a background in C, R & Matlab, Mata was a > thrilling find within Stata!) On Tue, Sep 27, 2011 at 4:29 PM, Richard Herron >> Thanks, all, for the input! >> >> I was able to get a serviceable solution using -correlate- to find beta. >> >> The next think I need to learn in Stats is writing my own .ado files >> and using Mata (when you loop over the existing functions, I think >> there can be too much overhead). >> >> On Tue, Sep 27, 2011 at 03:59, Nick Cox <njcoxstata@gmail.com> wrote: >>> Actually, I would guess that Austin's suggestion will run faster than >>> this, but we're just trading speculation. On Tue, Sep 27, 2011 at 7:32 AM, Partho Sarkar >>>> If all you really want is the autocorrelation coefficient, of course >>>> you don't really need -regress-, which does much more than just >>>> generate the regression coefficients. As an alternative to Austin's >>>> suggestion (and apriori I would expect this to be faster) >>>> you could also get the AC's via matrix computations in Mata, successively >>>> passing the y-vector (and the lagged y-vector?) for each firm to Mata >>>> within a loop, computing the sums, inner products etc., and passing >>>> the result back to Stata. >>>> >>>> Of course, Nick's point still holds: given your data size, this is >>>> likely to be time-consuming in any case. >>>> >>>> As a last thought, you are presumably interested in doing this for >>>> some "real" data- I think you might have an ill-conditioned matrix >>>> with your artificial example, which would partly account for the slow >>>> regressions. Richard Herron <richard.c.herron@gmail.com> >>>> I am using -rolling- for rolling regressions on panel data, but it is >>>> exceedingly slow. I found a Statalist thread >>>> (http://www.stata.com/statalist/archive/2009-09/msg01239.html) with a >>>> more manual solution, but it is equally slow (both are too slow to run >>>> to completion in a reasonable amount of time). >>>> >>>> Is -regress- the bottleneck? I only want the AR(1) coefficient; is >>>> there a different approach I should take? Are rolling >>>> regressions/calculations best done in different software? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Slow -rolling- regressions on panel data***From:*Partho Sarkar <partho.ss+lists@gmail.com>

**References**:**st: Slow -rolling- regressions on panel data***From:*Partho Sarkar <partho.ss+lists@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Richard Herron <richard.c.herron@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Partho Sarkar <partho.ss+lists@gmail.com>

- Prev by Date:
**Re: st: weights in pooled repeated cross sections** - Next by Date:
**st: List of my installed packages?** - Previous by thread:
**Re: st: Slow -rolling- regressions on panel data** - Next by thread:
**Re: st: Slow -rolling- regressions on panel data** - Index(es):