Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Richard Herron <richard.c.herron@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Slow -rolling- regressions on panel data |

Date |
Tue, 4 Oct 2011 23:09:04 -0400 |

For anyone following this thread in real-time or years later, I learned enough Stata/Mata to code and compile a simple OLS in Mata and found no speed improvement (in fact, a slight worsening). So it appears the slow-down was the scale of the problem. On Tue, Sep 27, 2011 at 19:45, Richard Herron <richard.c.herron@gmail.com> wrote: > Nick and Partho, thanks for the discussion! > > I certainly don't intend to recreate -regress- in Mata (I am coming > from R -- at the prodding of an advisor -- and love that in Stata I > can focus on the analysis and results, not the coding), but it's nice > to know that I can use Mata to get a sharp knife to combine with > -rolling- and shave some time off these calculations. > > On Tue, Sep 27, 2011 at 11:50, Nick Cox <njcoxstata@gmail.com> wrote: >> Thanks for your good-humoured response. >> >> I should stress that "oddball", like "pedant", is not for me a >> pejorative word and that I included myself. >> >> A key point you make here is that Mata is about much more than matrix >> algebra. Recently, a thread started by Adam Ozimek raised the awkward >> question of handling strings longer than 244 characters, which can't >> be fit into string variables. Not only are variables out of the >> question, this is difficult even to work with using locals longer than >> 244 characters. With Mata it starts looking like a Mickey Mouse >> problem: >> >> http://www.stata.com/statalist/archive/2011-09/msg01001.html >> http://www.stata.com/statalist/archive/2011-09/msg01033.html >> >> Nick >> >> On Tue, Sep 27, 2011 at 3:36 PM, Partho Sarkar >> <partho.ss+lists@gmail.com> wrote: >>> Touché, Nick! In fact, I had half thought of issuing a partial >>> retraction soon after shooting off that last post, before I left the >>> workplace. Just as well that I didn't, as we got an(other) excellent >>> post from Nick, as full of wit as good sense! (I especially liked the >>> digs at hobbyist "oddball" programmers- here is a self-confessed >>> case!) >>> >>> Certainly, I didn't mean to suggest that -regress- in particular, or >>> Stata built-in functions in general, can be beaten by (user-written) >>> Mata code. At least, not in Stata 11 or later, as they already >>> internalise much of the Mata tricks. (But for the problem at hand, >>> regress was certainly overkill, especially if it is going to be >>> invoked a few thousand times). >>> >>> Nonetheless, for some problems, Mata is pretty well worth the extra >>> effort. In particular, you do NOT have to code everything in terms of >>> basic matrix algebra. E.g., as Nick says: "The case of -correlate- is >>> similar but not identical, as there are functions you can use directly >>> in Mata to get correlations". And so are other Mata functions, I am >>> sure. >>> >>> Incidentally, this "Stata vs. Mata" issue seems to have been discussed >>> several times on the list. >>> >>> Best >>> Partho >>> >>> >>> On Tue, Sep 27, 2011 at 7:31 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>>> I don't want to dampen Partho's or Richard's enthusiasm for Mata, >>>> which I share, as the public record also shows. But I think this >>>> contrast is too broad-brush to be really helpful. You need to look at >>>> it problem by problem. >>>> >>>> Here I guess I am going some way beyond what Partho and Richard said, >>>> but the issues are of some general interest and there is some chance >>>> that some people will jump to the wrong impressions. >>>> >>>> For example, -regress- in Stata is a thin veneer of .ado code on top >>>> of an internal command which is compiled C code. Not only is that >>>> pretty fast, it also includes code for handling difficult regressions >>>> that is going to be absent from almost anybody's hand-coded regression >>>> code written in Mata using textbook formulae. There is plenty of scope >>>> for misunderstanding if "using Stata to do regressions" turns out to >>>> mean "using my own Mata code". >>>> >>>> The case of -correlate- is similar but not identical, as there are >>>> functions you can use directly in Mata to get correlations. >>>> >>>> The context is one of panel problems and rolling estimation to boot. >>>> That clearly implies that anyone taking the Mata route has to set up a >>>> framework of looping over panels and over moving windows, somehow or >>>> other. There are helper functions for that, but it does not qualify as >>>> a trivial problem for most people. >>>> >>>> Spending minutes or more likely hours or days to shave seconds off a >>>> program's time is one of the curious habits of many programmers. Of >>>> course, if you are going to use that program again and again, or >>>> programming is a kind of hobby any way, that's fine and I am one of >>>> the people who do that now and then. But any implication that it's >>>> generally best to program in Mata is not going to be accurate or >>>> realistic. >>>> >>>> Also, Stata was not developed to be slow and Mata to be fast. Stata >>>> was developed to be as fast as possible and various wrappers of >>>> interpreted code are then added when that is helpful, protective and >>>> doesn't slow things too much. >>>> >>>> Some statistical software seems to be developed on the presumption >>>> that deep down every data analyst would like to be a programmer. All I >>>> can say is that in my experience this does not appear to be generally >>>> true, although every extensible language depends on the oddballs for >>>> whom it is exactly right. >>>> >>>> On Tue, Sep 27, 2011 at 1:09 PM, Partho Sarkar >>>> <partho.ss+lists@gmail.com> wrote: >>>>> That's the real point- -regress- is too much firepower for just >>>>> finding a set of correlations. In fact, even -correlate- may be >>>>> rather heavier than necessary. My guess about Mata being likely to be >>>>> faster (certainly more elegant!) is based on the general premise that >>>>> Mata is designed to be (much) faster than Stata for the things it can >>>>> do. You might find "Mata, the missing manual" by William Gould a good >>>>> introduction. Also "Programming in Stata and Mata", by Christopher F >>>>> Baum. (Must say I have not had much occasion to actually use Mata so >>>>> far, but coming from a background in C, R & Matlab, Mata was a >>>>> thrilling find within Stata!) >>>> >>>> On Tue, Sep 27, 2011 at 4:29 PM, Richard Herron >>>> >>>>>> Thanks, all, for the input! >>>>>> >>>>>> I was able to get a serviceable solution using -correlate- to find beta. >>>>>> >>>>>> The next think I need to learn in Stats is writing my own .ado files >>>>>> and using Mata (when you loop over the existing functions, I think >>>>>> there can be too much overhead). >>>>>> >>>>>> On Tue, Sep 27, 2011 at 03:59, Nick Cox <njcoxstata@gmail.com> wrote: >>>> >>>>>>> Actually, I would guess that Austin's suggestion will run faster than >>>>>>> this, but we're just trading speculation. >>>> >>>> On Tue, Sep 27, 2011 at 7:32 AM, Partho Sarkar >>>> >>>>>>>> If all you really want is the autocorrelation coefficient, of course >>>>>>>> you don't really need -regress-, which does much more than just >>>>>>>> generate the regression coefficients. As an alternative to Austin's >>>>>>>> suggestion (and apriori I would expect this to be faster) >>>>>>>> you could also get the AC's via matrix computations in Mata, successively >>>>>>>> passing the y-vector (and the lagged y-vector?) for each firm to Mata >>>>>>>> within a loop, computing the sums, inner products etc., and passing >>>>>>>> the result back to Stata. >>>>>>>> >>>>>>>> Of course, Nick's point still holds: given your data size, this is >>>>>>>> likely to be time-consuming in any case. >>>>>>>> >>>>>>>> As a last thought, you are presumably interested in doing this for >>>>>>>> some "real" data- I think you might have an ill-conditioned matrix >>>>>>>> with your artificial example, which would partly account for the slow >>>>>>>> regressions. >>>> >>>> Richard Herron <richard.c.herron@gmail.com> >>>> >>>>>>>> I am using -rolling- for rolling regressions on panel data, but it is >>>>>>>> exceedingly slow. I found a Statalist thread >>>>>>>> (http://www.stata.com/statalist/archive/2009-09/msg01239.html) with a >>>>>>>> more manual solution, but it is equally slow (both are too slow to run >>>>>>>> to completion in a reasonable amount of time). >>>>>>>> >>>>>>>> Is -regress- the bottleneck? I only want the AR(1) coefficient; is >>>>>>>> there a different approach I should take? Are rolling >>>>>>>> regressions/calculations best done in different software? >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**re: st: rmanova or anova with repeated command, what to use?** - Next by Date:
**st: List of independent variables after estimation** - Previous by thread:
**st: biprobit gof test, partial observability, survey data (svy)** - Next by thread:
**st: List of independent variables after estimation** - Index(es):