Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: performance benchmarking stata (and mata) programs and improving program performance
From 
 
"Brent McSharry (ADHB)" <[email protected]> 
To 
 
"[email protected]" <[email protected]> 
Subject 
 
st: performance benchmarking stata (and mata) programs and improving program performance 
Date 
 
Tue, 23 Jul 2013 10:40:37 +1200 
I am trying to write a command to plot a risk adjusted exponentially weighted moving average chart. Calculation of the variance is computationally expensive by the formula I have.
When doing run lengths of my testing data set of 6800+ records, Stata slows down. Because I would like to put the command up on SSC, I would like it to be more performant.
The formula requires [sigma 1->N] calculations (so for 6800 records ~23 million). I rewrote the code in Mata, as I thought a compiled language must be faster. I also used a number of C 'performance optimisation' tricks (assuming a mata vector represents a memory serialized array). But it seems the new code is slightly slower. 
Questions are
1. Is there a way to get system time in ticks or milliseconds, so that performance of different routines can be benchmarked (creturn is only in secs)?
2. Why is the mata code taking so long?
3. How do I optimize further (eg setting matastrict on or off, handling setbreakintr manually etc, colvector vs rowvector vs vector)?
4. Is it possible to tell how much memory mata is using? There should not be more than a handful of single dimension arrays with a length of ~6000, but it would be nice to check the problem is not related to memory allocation and having to move RAM into scratch space.
Thank you very much for your thoughts/wisdom. The relevant code is below and I am NOT a stata programmer, so very appreciative of style tips as well as performance.
Brent McSharry
Intensivist
Starship children's hospital
-----------------------------
//Original stata
count if `touse'
local count `r(N)'
gsort -`touse' `timevar'
noi di as txt "calculating std errors " as res "`=`count'*(`count'+1)/2'" as txt " calculations started at " as res "`c(current_time)'"
gen `ewma_o'=`startest' in 1
replace `ewma_o'= `smooth'*`observed' + (`oneMinusSm'*`ewma_o'[_n-1]) in 2/`count'
gen `ewma_p'=`startest' in 1
replace `ewma_p'= `smooth'*`predicted' + (`oneMinusSm'*`ewma_p'[_n-1]) in 2/`count'
gen `var_p' = `predicted' *(1-`predicted') in 1/`count'
gen `workingsum' = .
gen `ewma_se' = .forvalues i = 1(1)`count' { 
	replace `workingsum' = (`oneMinusSm'^(2*(`i'-_n)) * `var_p') in 1/`i'
	sum `workingsum', meanonly
	replace `ewma_se' = sqrt(`r(sum)')* `smooth' in `i'
}
noi di as txt "di calculations finished at " as res "`c(current_time)'"
local z=invnormal(1-(`alpha1'/2))
gen `cl' = `ewma_se'  * `z'
gen `cl1_ub' = `ewma_p' + `cl'
gen `cl1_lb' = `ewma_p' - `cl'
local z=invnormal(1-(`alpha2'/2))
replace `cl' = sqrt(`ewma_se') * `smooth' * `z'
gen `cl2_ub' = `ewma_p' + `cl'
gen `cl2_lb' = `ewma_p' - `cl' 
-----------------------------
//mata performing same function in loop 
mata:
mata set matastrict on
mata set matadebug off
void getRaewma(string scalar expectVarname, 
	string scalar obsVarname,
	real scalar obsCount,
	real scalar smooth,
	real scalar startVal,
	string scalar emwaObsName,
	string scalar emwaExpectName,
	string scalar emwaSeName)
{
	real colvector ewmaExp, ewmaObs, ewmaSe
	real vector expectVec, obsVec, expectVar, smoothPow, colIndices
	real scalar oneMinusSm, j, k, len, currExp, currObs, expectJ, cumSeCalc
	oneMinusSm = 1 - smooth
	st_view(expectVec, ., expectVarname)
	st_view(obsVec, ., obsVarname)
	smoothPow = getSmoothPow(oneMinusSm, obsCount-1)
	len = rows(expectVec)
	ewmaExp = J(len,1,.)
	ewmaObs = J(len,1,.)
	ewmaSe = J(len,1,.)
	expectVar = J(obsCount,1,.)
	currExp = currObs = startVal
	for (j=1;j<=obsCount;j++) 
	{
		currObs = ewmaObs[j,1] = obsVec[j]*smooth + currObs *oneMinusSm
		expectJ = expectVec[j]
		currExp = ewmaExp[j,1] = expectJ*smooth + currExp *oneMinusSm
		cumSeCalc = expectVar[j] = expectJ*(1-expectJ)
		for (k=1;k<j;k++)
		{
			cumSeCalc = cumSeCalc + smoothPow[j-k] * expectVar[k]
		}
		ewmaSe[j,1] = smooth*sqrt(cumSeCalc)
	}
	(void) setbreakintr(0)
	colIndices = st_addvar("double", (emwaExpectName, emwaObsName, emwaSeName), 1)
	st_store(.,colIndices[1],ewmaExp)
	st_store(.,colIndices[2],ewmaObs)
	st_store(.,colIndices[3],ewmaSe)
	(void) setbreakintr(1)
}
//build lookup for power calculations (which tend to be computationally expensive)
real vector function getSmoothPow(real scalar oneMinusSm, real scalar length)
{
	real scalar i, oneMinusSm2, currPow
	real vector returnVar
	oneMinusSm2 = oneMinusSm * oneMinusSm
	returnVar = J(1,length,.)
	currPow = 1
	for (i=1;i<=length;i++) 
	{
		returnVar[i] = currPow = oneMinusSm2 * currPow
	}
	return(returnVar)
}
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/