[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Re: mvsumm calculation time

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Re: mvsumm calculation time
Date   Tue, 5 Aug 2008 17:29:48 +0100

Kit is one of the authors of -mvsumm-, which can be downloaded from SSC.
I am the other. 

My guess is that Austin's code will be faster than anything -mvsumm- or
-rolling- can do. 

My guess is also that -- in this instance -- bringing in Mata would not
help at all. 

There's no contradiction. -mvsumm- and (even more) -rolling- are
moderately general wrapper commands that set up the machinery for a
variety of calculations. It so happens that the sd of windows of 5 is a
simple enough problem that you can attack it from first principles. 

That said, using -double-s might do no harm. 

[email protected] 

Austin Nichols

Kit and unnamed correspondent:
It will be even faster to use the -by: gen- construct, since that is
written in very fast C code.  If you want a SD over a five-period
window within firm, just do something like:

tsset i t
sort i t
by i: g m=(y+l.y+l2.y+l3.y+l4.y)/5
by i: g v=(y-m)^2+(l.y-m)^2+(l2.y-m)^2+(l3.y-m)^2+(l4.y-m)^2
g sd=sqrt(v/4)

for some existing variable y (the latter 3 commands can easily be
condensed into one to further increase speed at some small cost in
readability). Or am I misunderstanding the nature of the problem?

On 8/5/08, Kit Baum <[email protected]> wrote:

> mvsumm is written in ado-file code. It probably should be rewritten to
> advantage of Mata. Since -mvsumm- was implemented, Stata added the
> prefix. It might be faster to use -rolling- (which creates a separate
> dataset of summary statistics when combined with -summarize-) in this
> On Aug 5, 2008, at 02:33 , statalist-digest wrote:
> > I have calculated the standard deviation of firm-level revenue using
> recommended mvsumm command such as:
> >
> > mvsumm Revenue, stat(sd) win(5) gen(rev5ysd) end
> >
> > I have the 64-bit version of Stata 10 SE (and a 64-bit computer). My
> sample size is over 1.1 million observations covering over 200,000
> over 6 years. It took my computer about 24-hours to compute this
> (although it worked just as advertised and gave me exactly the result
> needed).
> >
> > Does anyone have any recommendations to speed up computing time
since I
> need to compute about 8 more similar commands and don't want to tie up
> Stata for over a week? Or do I just need to accept the calculation
> since my data is so large?

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index