Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Improving code speed

From	George Vega Yon <[email protected]>
To	[email protected]
Subject	Re: st: Improving code speed
Date	Wed, 22 May 2013 14:02:15 -0400

Dear Luis,

Without know much about what you are trying to calculate, in a fast
check I see two things that can help you:

(1) Try using mata instead (loops are much more faster than stata), a
mata loop looks like this

local nreps = 100
mata:
for(i=1;i<=nreps;i++) {
      ... mata code...
}
end

(2) You can try using my module "parallel" which can speedup your code
without much effort, here is an example of how this works (you'll need
two dofiles, 1 for initial config and another for the loop itself):
_____________________________________________________________________________________
clear all
vers 11

// Setup
set obs 1000
gen id = _n
gen z = rnormal()
gen Y=0
global reps=10 // define the number of simulations
gen epsilon=rnormal() // generate the random var for the simulations

// Parallel setup (if you have quad-core computer)
// ssc install parallel, all
parallel setclusters 4


// Serial fashion
preserve
timer on 1
do mydofile
timer off 1

restore

// Parallel fashion
timer on 2

parallel do mydofile.do
timer off 2

// How fast??
timer list
_____________________________________________________________________________________

_________________________mydofile.do___________________________________________________
forvalues k=`=id[1]'(1)`=id[_N]'{

gen subs=(id<=`k')   // Define the subsample to be used
gen Y`k'=0      // gen the intermediate Y`k'

forvalues i=1(1)$reps{

gen x`i'=z // generate simulated variable
replace x`i'=z + rnormal() if id==`k' // Add the random part

gen t=(x`i')^2
bysort subs: egen tsum=sum(x`i')

gen Y_`i'=t/tsum if id ==`k' // Construct Y for simulation i
replace Y_`i'=0 if id!=`k'

replace Y`k'=Y`k' + Y_`i'
replace Y`k'=0 if id!=`k'

drop Y_`i' t tsum x`i'
}

replace Y`k'=Y`k'/$reps      // average Y from the 100 simulations
replace Y= Y + Y`k'
drop Y`k' subs
}
_____________________________________________________________________________________

Hope it helps,

Cheers!

George Vega Yon
7 647 2552
http://cl.linkedin.com/in/georgevegayon


2013/5/22 Luis <[email protected]>:
> Dear statalist  users,
>
> I am running into a "loop efficiency problem" in that I have to
> construct a variable using many iterations and I am not sure whether I
> am being as efficient as possible. Given the number of observations
> that I have and with my current code, I have to wait days for my code
> to finish running! Here's my problem:
>
> I have a total of 50000 observations and need to construct a variable
> Y that will be computed using different subsamples of these
> observations. In particular,
> Y=Y1 when the subsample contains only the first observation,
> Y=Y2 when the subsample contains observations 1 and 2,
> Y=Y3 when the subsample contains observations 1, 2 and 3 etc until
> Y=Y50000.
>
> The idea is therefore to loop over the sample and define the subsample
> which contains observations 1 until k and construct the variable
> Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k'
> after each loop to end up with the final variable Y.
>
> To further complicate things, the variable Y needs to be the average
> of 100 simulations that depend on draws taken from a normal
> distribution. Hence I need to do a loop within the initial loop in
> order to do the 100 simulations.
>
> My code therefore looks like this:
>
> _____________________________________________________________________________________
>
> gen Y=0
>
> local reps=100 \\ define the number of simulations
>
> gen epsilon=rnormal() \\ generate the random var for the simulations
>
> forvalues k=1(1)50000{
>
> gen subs=(id<=`k')   \\ Define the subsample to be used
> gen Y`k'=0      \\ gen the intermediate Y`k'
>
>         forvalues i=1(1)`reps'{
>
>                 gen x`i'=z \\ generate simulated variable
>                 gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part
>
>         gen t=(x`i')^2
>         bysort subs: egen tsum=sum(x`i')
>
>         gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i
>         replace Y_`i'=0 if id!=`k'
>
>         replace Y`k'=Y`k' + Y_`i'
>                 replace Y`k'=0 if id!=`k'
>
>         drop Y_`i' t tsum x`i'
>         }
>
> replace Y`k'=Y`k'/`reps'      // average Y from the 100 simulations
> replace Y= Y + Y`k'
> drop Y`k' subs
>         }
>
> ____________________________________________________________________________________
>
>
> The code runs fine, but I takes a lot of time since it has to
> construct 100 variables for each of the 50000 iterations. I have tried
> many different possibilities and I can't think of another way of
> constructing Y.
>
> Any tip or suggestion that would help improve the efficiency of my
> code would be greatly appreciated!!!
>
> Many thanks in advance!
> Luis
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Improving code speed
  - From: Luis <[email protected]>

Prev by Date: Re: st: Observations that keep a feature... an additional problem
Next by Date: Re: st: Improving code speed
Previous by thread: st: Improving code speed
Next by thread: Re: st: Improving code speed
Index(es):
- Date
- Thread