Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Richard Herron <richard.c.herron@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Slow -rolling- regressions on panel data |

Date |
Tue, 27 Sep 2011 19:45:16 -0400 |

Nick and Partho, thanks for the discussion! I certainly don't intend to recreate -regress- in Mata (I am coming from R -- at the prodding of an advisor -- and love that in Stata I can focus on the analysis and results, not the coding), but it's nice to know that I can use Mata to get a sharp knife to combine with -rolling- and shave some time off these calculations. On Tue, Sep 27, 2011 at 11:50, Nick Cox <njcoxstata@gmail.com> wrote: > Thanks for your good-humoured response. > > I should stress that "oddball", like "pedant", is not for me a > pejorative word and that I included myself. > > A key point you make here is that Mata is about much more than matrix > algebra. Recently, a thread started by Adam Ozimek raised the awkward > question of handling strings longer than 244 characters, which can't > be fit into string variables. Not only are variables out of the > question, this is difficult even to work with using locals longer than > 244 characters. With Mata it starts looking like a Mickey Mouse > problem: > > http://www.stata.com/statalist/archive/2011-09/msg01001.html > http://www.stata.com/statalist/archive/2011-09/msg01033.html > > Nick > > On Tue, Sep 27, 2011 at 3:36 PM, Partho Sarkar > <partho.ss+lists@gmail.com> wrote: >> Touché, Nick! In fact, I had half thought of issuing a partial >> retraction soon after shooting off that last post, before I left the >> workplace. Just as well that I didn't, as we got an(other) excellent >> post from Nick, as full of wit as good sense! (I especially liked the >> digs at hobbyist "oddball" programmers- here is a self-confessed >> case!) >> >> Certainly, I didn't mean to suggest that -regress- in particular, or >> Stata built-in functions in general, can be beaten by (user-written) >> Mata code. At least, not in Stata 11 or later, as they already >> internalise much of the Mata tricks. (But for the problem at hand, >> regress was certainly overkill, especially if it is going to be >> invoked a few thousand times). >> >> Nonetheless, for some problems, Mata is pretty well worth the extra >> effort. In particular, you do NOT have to code everything in terms of >> basic matrix algebra. E.g., as Nick says: "The case of -correlate- is >> similar but not identical, as there are functions you can use directly >> in Mata to get correlations". And so are other Mata functions, I am >> sure. >> >> Incidentally, this "Stata vs. Mata" issue seems to have been discussed >> several times on the list. >> >> Best >> Partho >> >> >> On Tue, Sep 27, 2011 at 7:31 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> I don't want to dampen Partho's or Richard's enthusiasm for Mata, >>> which I share, as the public record also shows. But I think this >>> contrast is too broad-brush to be really helpful. You need to look at >>> it problem by problem. >>> >>> Here I guess I am going some way beyond what Partho and Richard said, >>> but the issues are of some general interest and there is some chance >>> that some people will jump to the wrong impressions. >>> >>> For example, -regress- in Stata is a thin veneer of .ado code on top >>> of an internal command which is compiled C code. Not only is that >>> pretty fast, it also includes code for handling difficult regressions >>> that is going to be absent from almost anybody's hand-coded regression >>> code written in Mata using textbook formulae. There is plenty of scope >>> for misunderstanding if "using Stata to do regressions" turns out to >>> mean "using my own Mata code". >>> >>> The case of -correlate- is similar but not identical, as there are >>> functions you can use directly in Mata to get correlations. >>> >>> The context is one of panel problems and rolling estimation to boot. >>> That clearly implies that anyone taking the Mata route has to set up a >>> framework of looping over panels and over moving windows, somehow or >>> other. There are helper functions for that, but it does not qualify as >>> a trivial problem for most people. >>> >>> Spending minutes or more likely hours or days to shave seconds off a >>> program's time is one of the curious habits of many programmers. Of >>> course, if you are going to use that program again and again, or >>> programming is a kind of hobby any way, that's fine and I am one of >>> the people who do that now and then. But any implication that it's >>> generally best to program in Mata is not going to be accurate or >>> realistic. >>> >>> Also, Stata was not developed to be slow and Mata to be fast. Stata >>> was developed to be as fast as possible and various wrappers of >>> interpreted code are then added when that is helpful, protective and >>> doesn't slow things too much. >>> >>> Some statistical software seems to be developed on the presumption >>> that deep down every data analyst would like to be a programmer. All I >>> can say is that in my experience this does not appear to be generally >>> true, although every extensible language depends on the oddballs for >>> whom it is exactly right. >>> >>> On Tue, Sep 27, 2011 at 1:09 PM, Partho Sarkar >>> <partho.ss+lists@gmail.com> wrote: >>>> That's the real point- -regress- is too much firepower for just >>>> finding a set of correlations. In fact, even -correlate- may be >>>> rather heavier than necessary. My guess about Mata being likely to be >>>> faster (certainly more elegant!) is based on the general premise that >>>> Mata is designed to be (much) faster than Stata for the things it can >>>> do. You might find "Mata, the missing manual" by William Gould a good >>>> introduction. Also "Programming in Stata and Mata", by Christopher F >>>> Baum. (Must say I have not had much occasion to actually use Mata so >>>> far, but coming from a background in C, R & Matlab, Mata was a >>>> thrilling find within Stata!) >>> >>> On Tue, Sep 27, 2011 at 4:29 PM, Richard Herron >>> >>>>> Thanks, all, for the input! >>>>> >>>>> I was able to get a serviceable solution using -correlate- to find beta. >>>>> >>>>> The next think I need to learn in Stats is writing my own .ado files >>>>> and using Mata (when you loop over the existing functions, I think >>>>> there can be too much overhead). >>>>> >>>>> On Tue, Sep 27, 2011 at 03:59, Nick Cox <njcoxstata@gmail.com> wrote: >>> >>>>>> Actually, I would guess that Austin's suggestion will run faster than >>>>>> this, but we're just trading speculation. >>> >>> On Tue, Sep 27, 2011 at 7:32 AM, Partho Sarkar >>> >>>>>>> If all you really want is the autocorrelation coefficient, of course >>>>>>> you don't really need -regress-, which does much more than just >>>>>>> generate the regression coefficients. As an alternative to Austin's >>>>>>> suggestion (and apriori I would expect this to be faster) >>>>>>> you could also get the AC's via matrix computations in Mata, successively >>>>>>> passing the y-vector (and the lagged y-vector?) for each firm to Mata >>>>>>> within a loop, computing the sums, inner products etc., and passing >>>>>>> the result back to Stata. >>>>>>> >>>>>>> Of course, Nick's point still holds: given your data size, this is >>>>>>> likely to be time-consuming in any case. >>>>>>> >>>>>>> As a last thought, you are presumably interested in doing this for >>>>>>> some "real" data- I think you might have an ill-conditioned matrix >>>>>>> with your artificial example, which would partly account for the slow >>>>>>> regressions. >>> >>> Richard Herron <richard.c.herron@gmail.com> >>> >>>>>>> I am using -rolling- for rolling regressions on panel data, but it is >>>>>>> exceedingly slow. I found a Statalist thread >>>>>>> (http://www.stata.com/statalist/archive/2009-09/msg01239.html) with a >>>>>>> more manual solution, but it is equally slow (both are too slow to run >>>>>>> to completion in a reasonable amount of time). >>>>>>> >>>>>>> Is -regress- the bottleneck? I only want the AR(1) coefficient; is >>>>>>> there a different approach I should take? Are rolling >>>>>>> regressions/calculations best done in different software? >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Slow -rolling- regressions on panel data***From:*Partho Sarkar <partho.ss+lists@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Richard Herron <richard.c.herron@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Partho Sarkar <partho.ss+lists@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Partho Sarkar <partho.ss+lists@gmail.com>

**Re: st: Slow -rolling- regressions on panel data***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**st: Twoway options with grqreg** - Next by Date:
**Re: st: STATA and HLM Model Building** - Previous by thread:
**Re: st: Slow -rolling- regressions on panel data** - Next by thread:
**st: Regional dummies in cross section treatment effects** - Index(es):