Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: efficient programming in mata - "meanby()" example

 From Andrew Maurer To "statalist@hsphsun2.harvard.edu" Subject RE: st: efficient programming in mata - "meanby()" example Date Tue, 25 Feb 2014 19:33:39 +0000

```Hi Statalist,

I have a few additional questions about efficient programming in mata and wonder if anyone has any insight:

1)
Can the storage type of a real matrix be changed in the same way as the storage type of a Stata variable?  (byte/int/long/etc). Eg) if I have a mata matrix that contains only 1s and 0s, I wouldn't want to waste memory having it stored in memory as a "float matrix" rather than a "byte matrix".

2)
How can I change the precision with which basic math functions are conducted (addition, multiplication, etc)? Eg, -help mata quadcross- says "sums are formed in quad precision rather than in double precision". I can't see the source for quadcross so I can't determine how this is programmed.

3)
Back to efficiency in my meanby() function, I want to try to add the option of having weights without slowing down the function when weights are not specified. In my new code at the bottom of this post, with the option for weights, the program is significantly slowed down due to having to access a filler weight matrix of 1s if weights aren't specified. Efficiency would be easy in stata syntax,
`by' gen `ty' `y' = sum(`w'*`x')/sum(cond(`x'<.,`w',0))
where `w' can come from a local = 1 if weights were not specified or a whole temp variable of weights if weights were specified. However, since mata doesn't have a conception of locals, this isn't possible in mata. Does anyone have any suggestions on how I could alter my meanby() syntax to not slow down when weights are specified ? The only efficient way I can think of would be to define two separate functions, meanby_noweights() and meanby_weights, and choose which one to call based on whether weights were specified.

4)
One last aside on style - is there a more elegant way to construct the "out" matrix in meanby()? I've run into this problem when programming in Stata as well. I want to loop and append a row to the matrix on each iteration. However, on the first iteration, where I define "out = out \ (previd, runsum/count)", out has to already exist. So before the loop, I have to define out = (., .) and after the loop remove this "filler row". This doesn't seem like good programming style, but I don't know of an alternative.

***** define meanby() function ********
mata
real matrix meanby(idname, xname, touse, |weight)
{
// sort data in stata
stata("sort " + idname)

st_view(id=0, ., idname, touse)
st_view(data=0, ., xname, touse)
if (args() == 3) aw = J(rows(data),1,1)
else {
st_view(aw=0, ., weight, touse)
aw = aw :* (aw :> 0)
}

// initialize mata objects
real matrix out
real scalar idnum, val, previd, count, runsum
runsum = 0
count = 0
previd = id[1,.]
numx = cols(data)
out = J(1, cols(id)+cols(data), .)

// loop through observations
for (i=1; i<=rows(data); i++) {
idnum = id[i,.]
val = data[i,.]
weight = aw[i,1]
if (idnum != previd) {
out = out \ (previd, runsum/count)
count = 0
runsum = J(1,numx,0)
}
count = count + weight
runsum = runsum :+ val*weight
previd = idnum
}

// final row
out = out \ (previd, runsum/count)

// output (exclude "filler" first row)
return(out[|(2,1)\(rows(out),cols(out))|])
}
end
***** end meanby() definition *********

Thank you,
Andrew Maurer

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy Radyakin
Sent: Monday, February 24, 2014 12:12 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: efficient programming in mata - "meanby()" example

Andrew, are you using MP2?
Try to set processors 1 before running the benchmark to equal the field. Stata wouldn't let you write parallel code in Mata, although it is using parallelization in its own built-ins. After setting processors to 1, your code is not that bad (in 12.0):

. timer list 1
1:      8.33 /        1 =       8.3290
. timer list 2
2:      7.60 /        1 =       7.6000

There are probably ways to make it faster, but because of the lack of parallelization in Mata there is no point to compete with true built-in C code.

Note also that plugins will not be a solution. Plugin interface is not afaik thread-safe.

Hope this helps, Sergiy

On Mon, Feb 24, 2014 at 11:10 AM, Andrew Maurer <Andrew.Maurer@qrm.com> wrote:
> Hi Statalist,
>
> I'm trying to get an idea of how to program in mata as efficiently as possible. I'm starting by trying to code a program in mata that calculates means-by-groups and runs at least as quickly as stata's "collapse (mean)..., by()" function. My objective at the moment is to learn more about programming technique as opposed to creating something new.
>
> I've whittled down my code to what's shown in the meanby() function below. The idea is to:
>         1) Sort data by panel variable(s)
>         2) Loop through observations, keep a running sum of the "x" variable and of the count of nonmissing x values
>         3) Whenever a new panel is reached, write the previous panel's
> average to an "out" object and reset the running sums
>
> My version below takes 9 seconds, while Stata's collapse takes 5 seconds with the example data shown. Does anyone have any feedback on how I could improve my code or insight as to what's going on at a low level in Stata's "by... : gen..." syntax that makes it more efficient than the loop I've written?
>
> Thank you,
> Andrew Maurer
>
> ******* excerpt from Stata's "collapse" for comparison **************
> `by' gen `ty' `y' = sum(`w'*`x')/sum(cond(`x'<.,`w',0))
> `by' replace `y' = `y'[_N]
> sort `by'
> quietly by `by': keep if _n==_N
> ******* end Sata excerpt ********************************************
>
>
> ***** define meanby() function ******** mata
>
> real matrix meanby(idname, xname)
> {
>         // sort data in stata
>         stata("sort " + idname)
>
>         // load data into mata
>         st_view(id=0, ., idname)
>         st_view(data=0, ., xname)
>
>         // initialize mata objects
>         real matrix out
>         real scalar idnum, val, previd, count, runsum
>         runsum = 0
>         count = 0
>         previd = id[1,.]
>         out = J(1, cols(id)+1, .)
>
>         // loop through observations
>         for (i=1; i<=rows(data); i++) {
>                 idnum = id[i,.]
>                 val = data[i,1]
>                 if (idnum != previd) {
>                         out = out \ (previd, runsum/count)
>                         count = 0
>                         runsum = 0
>                 }
>                 if (val != .) {
>                         count = count + 1
>                         runsum = runsum + val
>                 }
>                 previd = idnum
>         }
>
>         // final row
>         out = out \ (previd, runsum/count)
>
>         // output (exclude "filler" first row)
>         return(out[|(2,1)\(rows(out),2)|])
> }
>
> end
> ***** end meanby() definition *********
>
>
> ***** benchmark meanby vs stata collapse ***** // create some panel
> data // 30 panels, 100 dates long clear all local n 10000000 set obs
> `n'
> gen byte panelid = int( 30/`n' * (_n-1) ) gen int date = mod(_n,100)
> gen x = runiform()
>
> sort panelid date
>
> // time meanby() using same data
> timer on 1
>         qui mata: meanby("panelid date","x") timer off 1 timer list 1
>
> // time Stata's collapse
> timer on 2
> collapse (mean) x, by(panelid date)
> timer off 2
> timer list 2
> ***** end benchmark **************************
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```