Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: egen and Mata


From   "C�lio J�nior" <[email protected]>
To   [email protected]
Subject   Re: st: egen and Mata
Date   Tue, 15 Jul 2008 16:13:50 -0300

I'm not much into Mata, but I think you should start trying st_data()
instead st_view().

On Tue, Jul 15, 2008 at 3:50 PM, Paulo Guimaraes <[email protected]> wrote:
> Dear all,
>
> Because I am working with large datasets I tried to code in Mata the
> equivalent to the following Stata command:
>
> egen newvar=mean(var), by(id)
>
> believing that the results would be much faster. I coded two very similar
> Mata routines, which are listed below.
> To my surprise the egen command was faster. One of the Mata routines is
> amazingly slow.
> My questions are:
>
> 1) Why is mean_by2() so slow?
>
> 2) Is it possible to make the Mata code more efficient/faster than Stata's
> egen?
>
> Any comments/suggestions are appreciated
>
> Paulo Guimaraes
>
>
>
> ************************************************************************
> ************************************************************************
>  set mem 500m
>
> Current memory allocation
>
>                    current                                 memory usage
>    settable          value     description                 (1M = 1024k)
>    --------------------------------------------------------------------
>    set maxvar         5000     max. variables allowed           1.909M
>    set memory          500M    max. data space                500.000M
>    set matsize         400     max. RHS vars in models          1.254M
>                                                            -----------
>                                                               503.163M
>
> . set obs 100000
> obs was 0, now 100000
>
> . set more off
>
> . egen id=seq(), from(1) to(10000) block(1)
>
> . gen y=uniform()
>
> . timer on 1
>
> . egen double x1=mean(y), by(id)
>
> . timer off 1
>
> . timer on 2
>
> . mata: mean_by1("y","id")
>
> . timer off 2
>
> . timer on 3
>
> . mata: mean_by2("y","id")
>
> . timer off 3
>
> . timer list
>   1:      0.59 /        1 =       0.5940
>   2:     65.81 /        1 =      65.8130
>   3:      0.86 /        1 =       0.8590
>
> *******************************************************************************************
> *******************************************************************************************
>
> mata:
> void function mean_by1(string scalar var, string scalar groupid)
> {
> real colvector order, means, ym
> numeric matrix data, info, X
> st_view(id=.,.,groupid)
> st_view(y=.,.,var)
> order=J(rows(y),1,1)
> order=runningsum(order)
> data=(order,y,id)
> _sort(data,3)
> info=panelsetup(data[.,3],1)
> means=J(rows(info),1,0)
> ym=J(rows(data),1,0)
> for (i=1; i<=rows(info); i++) {
>        X=panelsubmatrix(data[.,2],i,info)
>        means[i,1]=mean(X)
>        }
> for (i=1; i<=rows(info); i++) {
>        means[i,1]=mean(panelsubmatrix(data[.,2],i,info))
>        }
> data=(data[.,1],ym,data[.,3])
> _sort(data,1)
> index1=st_addvar("double","_y1")
> st_store((1,rows(y)),index1,data[.,2])
> }
> end
>
> mata:
> void function mean_by2(string scalar var, string scalar groupid)
> {
> real colvector order, means, ym
> numeric matrix data, info, X
> st_view(id=.,.,groupid)
> st_view(y=.,.,var)
> order=J(rows(y),1,1)
> order=runningsum(order)
> data=(order,y,id)
> _sort(data,3)
> info=panelsetup(data[.,3],1)
> means=J(rows(info),1,0)
> ym=J(rows(data),1,0)
> for (i=1; i<=rows(info); i++) {
>        X=data[| info[i,1],2\info[i,2],2|]
>        means[i,1]=mean(X)
>        }
> for (i=1; i<=rows(info); i++) {
>    for (j=info[i,1]; j<=info[i,2]; j++) {
>        ym[j,1]=means[i,1]
>    }
> }
> data=(data[.,1],ym,data[.,3])
> _sort(data,1)
> index1=st_addvar("double","_y2")
> st_store((1,rows(y)),index1,data[.,2])
> }
> end
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index