Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: performance benchmarking stata (and mata) programs and improving program performance


From   Sergiy Radyakin <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: performance benchmarking stata (and mata) programs and improving program performance
Date   Tue, 23 Jul 2013 00:00:27 -0400

On Mon, Jul 22, 2013 at 6:40 PM, Brent McSharry (ADHB)
<[email protected]> wrote:
> I am trying to write a command to plot a risk adjusted exponentially weighted moving average chart. Calculation of the variance is computationally expensive by the formula I have.
>
> When doing run lengths of my testing data set of 6800+ records, Stata slows down. Because I would like to put the command up on SSC, I would like it to be more performant.
>
> The formula requires [sigma 1->N] calculations (so for 6800 records ~23 million). I rewrote the code in Mata, as I thought a compiled language must be faster. I also used a number of C 'performance optimisation' tricks (assuming a mata vector represents a memory serialized array). But it seems the new code is slightly slower.
>
> Questions are
> 1. Is there a way to get system time in ticks or milliseconds, so that performance of different routines can be benchmarked (creturn is only in secs)?

there is a parallel discussion of it here:
http://hsphsun3.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1307/date/article-800.html

> 2. Why is the mata code taking so long?
> 3. How do I optimize further (eg setting matastrict on or off, handling setbreakintr manually etc, colvector vs rowvector vs vector)?

Perhaps let's discuss the algorithm? What is the array you are
generating. The size of the array is nobs. There is no point computing
alpha^n after a reasonably reachable n - it is just going to be zero.
So perhaps your expensive procedure can be not populating an array of
6000, but an array of 600? For Stata anything smaller than 0.6^729 is
zero anyways. Have a look at your array and check whether the loop
must be so long.


> 4. Is it possible to tell how much memory mata is using? There should not be more than a handful of single dimension arrays with a length of ~6000, but it would be nice to check the problem is not related to memory allocation and having to move RAM into scratch space.

yes: mata mata des   reports a sizes of mata structures in bytes.

Best, Sergiy
>
> Thank you very much for your thoughts/wisdom. The relevant code is below and I am NOT a stata programmer, so very appreciative of style tips as well as performance.
>
> Brent McSharry
> Intensivist
> Starship children's hospital
> -----------------------------
> //Original stata
> count if `touse'
> local count `r(N)'
> gsort -`touse' `timevar'
> noi di as txt "calculating std errors " as res "`=`count'*(`count'+1)/2'" as txt " calculations started at " as res "`c(current_time)'"
> gen `ewma_o'=`startest' in 1
> replace `ewma_o'= `smooth'*`observed' + (`oneMinusSm'*`ewma_o'[_n-1]) in 2/`count'
> gen `ewma_p'=`startest' in 1
> replace `ewma_p'= `smooth'*`predicted' + (`oneMinusSm'*`ewma_p'[_n-1]) in 2/`count'
> gen `var_p' = `predicted' *(1-`predicted') in 1/`count'
> gen `workingsum' = .
> gen `ewma_se' = .forvalues i = 1(1)`count' {
>         replace `workingsum' = (`oneMinusSm'^(2*(`i'-_n)) * `var_p') in 1/`i'
>         sum `workingsum', meanonly
>         replace `ewma_se' = sqrt(`r(sum)')* `smooth' in `i'
> }
> noi di as txt "di calculations finished at " as res "`c(current_time)'"
> local z=invnormal(1-(`alpha1'/2))
> gen `cl' = `ewma_se'  * `z'
> gen `cl1_ub' = `ewma_p' + `cl'
> gen `cl1_lb' = `ewma_p' - `cl'
> local z=invnormal(1-(`alpha2'/2))
> replace `cl' = sqrt(`ewma_se') * `smooth' * `z'
> gen `cl2_ub' = `ewma_p' + `cl'
> gen `cl2_lb' = `ewma_p' - `cl'
> -----------------------------
> //mata performing same function in loop
> mata:
> mata set matastrict on
> mata set matadebug off
> void getRaewma(string scalar expectVarname,
>         string scalar obsVarname,
>         real scalar obsCount,
>         real scalar smooth,
>         real scalar startVal,
>         string scalar emwaObsName,
>         string scalar emwaExpectName,
>         string scalar emwaSeName)
> {
>         real colvector ewmaExp, ewmaObs, ewmaSe
>         real vector expectVec, obsVec, expectVar, smoothPow, colIndices
>         real scalar oneMinusSm, j, k, len, currExp, currObs, expectJ, cumSeCalc
>         oneMinusSm = 1 - smooth
>         st_view(expectVec, ., expectVarname)
>         st_view(obsVec, ., obsVarname)
>         smoothPow = getSmoothPow(oneMinusSm, obsCount-1)
>         len = rows(expectVec)
>         ewmaExp = J(len,1,.)
>         ewmaObs = J(len,1,.)
>         ewmaSe = J(len,1,.)
>         expectVar = J(obsCount,1,.)
>         currExp = currObs = startVal
>         for (j=1;j<=obsCount;j++)
>         {
>                 currObs = ewmaObs[j,1] = obsVec[j]*smooth + currObs *oneMinusSm
>                 expectJ = expectVec[j]
>                 currExp = ewmaExp[j,1] = expectJ*smooth + currExp *oneMinusSm
>                 cumSeCalc = expectVar[j] = expectJ*(1-expectJ)
>                 for (k=1;k<j;k++)
>                 {
>                         cumSeCalc = cumSeCalc + smoothPow[j-k] * expectVar[k]
>                 }
>                 ewmaSe[j,1] = smooth*sqrt(cumSeCalc)
>         }
>         (void) setbreakintr(0)
>         colIndices = st_addvar("double", (emwaExpectName, emwaObsName, emwaSeName), 1)
>         st_store(.,colIndices[1],ewmaExp)
>         st_store(.,colIndices[2],ewmaObs)
>         st_store(.,colIndices[3],ewmaSe)
>         (void) setbreakintr(1)
> }
> //build lookup for power calculations (which tend to be computationally expensive)
> real vector function getSmoothPow(real scalar oneMinusSm, real scalar length)
> {
>         real scalar i, oneMinusSm2, currPow
>         real vector returnVar
>         oneMinusSm2 = oneMinusSm * oneMinusSm
>         returnVar = J(1,length,.)
>         currPow = 1
>         for (i=1;i<=length;i++)
>         {
>                 returnVar[i] = currPow = oneMinusSm2 * currPow
>         }
>         return(returnVar)
> }
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index