Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Re: mvsumm calculation time


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Re: mvsumm calculation time
Date   Tue, 5 Aug 2008 17:29:48 +0100

Kit is one of the authors of -mvsumm-, which can be downloaded from SSC.
I am the other. 

My guess is that Austin's code will be faster than anything -mvsumm- or
-rolling- can do. 

My guess is also that -- in this instance -- bringing in Mata would not
help at all. 

There's no contradiction. -mvsumm- and (even more) -rolling- are
moderately general wrapper commands that set up the machinery for a
variety of calculations. It so happens that the sd of windows of 5 is a
simple enough problem that you can attack it from first principles. 

That said, using -double-s might do no harm. 

Nick 
n.j.cox@durham.ac.uk 

Austin Nichols

Kit and unnamed correspondent:
It will be even faster to use the -by: gen- construct, since that is
written in very fast C code.  If you want a SD over a five-period
window within firm, just do something like:

tsset i t
sort i t
by i: g m=(y+l.y+l2.y+l3.y+l4.y)/5
by i: g v=(y-m)^2+(l.y-m)^2+(l2.y-m)^2+(l3.y-m)^2+(l4.y-m)^2
g sd=sqrt(v/4)

for some existing variable y (the latter 3 commands can easily be
condensed into one to further increase speed at some small cost in
readability). Or am I misunderstanding the nature of the problem?

On 8/5/08, Kit Baum <baum@bc.edu> wrote:

> mvsumm is written in ado-file code. It probably should be rewritten to
take
> advantage of Mata. Since -mvsumm- was implemented, Stata added the
rolling:
> prefix. It might be faster to use -rolling- (which creates a separate
> dataset of summary statistics when combined with -summarize-) in this
case.
>
>
> On Aug 5, 2008, at 02:33 , statalist-digest wrote:
>
> > I have calculated the standard deviation of firm-level revenue using
the
> recommended mvsumm command such as:
> >
> > mvsumm Revenue, stat(sd) win(5) gen(rev5ysd) end
> >
> > I have the 64-bit version of Stata 10 SE (and a 64-bit computer). My
> sample size is over 1.1 million observations covering over 200,000
firms
> over 6 years. It took my computer about 24-hours to compute this
statistic
> (although it worked just as advertised and gave me exactly the result
I
> needed).
> >
> > Does anyone have any recommendations to speed up computing time
since I
> need to compute about 8 more similar commands and don't want to tie up
my
> Stata for over a week? Or do I just need to accept the calculation
time
> since my data is so large?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index