Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
George Vega Yon <g.vegayon@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Improving code speed |

Date |
Wed, 22 May 2013 14:02:15 -0400 |

Dear Luis, Without know much about what you are trying to calculate, in a fast check I see two things that can help you: (1) Try using mata instead (loops are much more faster than stata), a mata loop looks like this local nreps = 100 mata: for(i=1;i<=nreps;i++) { ... mata code... } end (2) You can try using my module "parallel" which can speedup your code without much effort, here is an example of how this works (you'll need two dofiles, 1 for initial config and another for the loop itself): _____________________________________________________________________________________ clear all vers 11 // Setup set obs 1000 gen id = _n gen z = rnormal() gen Y=0 global reps=10 // define the number of simulations gen epsilon=rnormal() // generate the random var for the simulations // Parallel setup (if you have quad-core computer) // ssc install parallel, all parallel setclusters 4 // Serial fashion preserve timer on 1 do mydofile timer off 1 restore // Parallel fashion timer on 2 parallel do mydofile.do timer off 2 // How fast?? timer list _____________________________________________________________________________________ _________________________mydofile.do___________________________________________________ forvalues k=`=id[1]'(1)`=id[_N]'{ gen subs=(id<=`k') // Define the subsample to be used gen Y`k'=0 // gen the intermediate Y`k' forvalues i=1(1)$reps{ gen x`i'=z // generate simulated variable replace x`i'=z + rnormal() if id==`k' // Add the random part gen t=(x`i')^2 bysort subs: egen tsum=sum(x`i') gen Y_`i'=t/tsum if id ==`k' // Construct Y for simulation i replace Y_`i'=0 if id!=`k' replace Y`k'=Y`k' + Y_`i' replace Y`k'=0 if id!=`k' drop Y_`i' t tsum x`i' } replace Y`k'=Y`k'/$reps // average Y from the 100 simulations replace Y= Y + Y`k' drop Y`k' subs } _____________________________________________________________________________________ Hope it helps, Cheers! George Vega Yon 7 647 2552 http://cl.linkedin.com/in/georgevegayon 2013/5/22 Luis <stataluis@gmail.com>: > Dear statalist users, > > I am running into a "loop efficiency problem" in that I have to > construct a variable using many iterations and I am not sure whether I > am being as efficient as possible. Given the number of observations > that I have and with my current code, I have to wait days for my code > to finish running! Here's my problem: > > I have a total of 50000 observations and need to construct a variable > Y that will be computed using different subsamples of these > observations. In particular, > Y=Y1 when the subsample contains only the first observation, > Y=Y2 when the subsample contains observations 1 and 2, > Y=Y3 when the subsample contains observations 1, 2 and 3 etc until > Y=Y50000. > > The idea is therefore to loop over the sample and define the subsample > which contains observations 1 until k and construct the variable > Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k' > after each loop to end up with the final variable Y. > > To further complicate things, the variable Y needs to be the average > of 100 simulations that depend on draws taken from a normal > distribution. Hence I need to do a loop within the initial loop in > order to do the 100 simulations. > > My code therefore looks like this: > > _____________________________________________________________________________________ > > gen Y=0 > > local reps=100 \\ define the number of simulations > > gen epsilon=rnormal() \\ generate the random var for the simulations > > forvalues k=1(1)50000{ > > gen subs=(id<=`k') \\ Define the subsample to be used > gen Y`k'=0 \\ gen the intermediate Y`k' > > forvalues i=1(1)`reps'{ > > gen x`i'=z \\ generate simulated variable > gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part > > gen t=(x`i')^2 > bysort subs: egen tsum=sum(x`i') > > gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i > replace Y_`i'=0 if id!=`k' > > replace Y`k'=Y`k' + Y_`i' > replace Y`k'=0 if id!=`k' > > drop Y_`i' t tsum x`i' > } > > replace Y`k'=Y`k'/`reps' // average Y from the 100 simulations > replace Y= Y + Y`k' > drop Y`k' subs > } > > ____________________________________________________________________________________ > > > The code runs fine, but I takes a lot of time since it has to > construct 100 variables for each of the 50000 iterations. I have tried > many different possibilities and I can't think of another way of > constructing Y. > > Any tip or suggestion that would help improve the efficiency of my > code would be greatly appreciated!!! > > Many thanks in advance! > Luis > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Improving code speed***From:*Luis <stataluis@gmail.com>

- Prev by Date:
**Re: st: Observations that keep a feature... an additional problem** - Next by Date:
**Re: st: Improving code speed** - Previous by thread:
**st: Improving code speed** - Next by thread:
**Re: st: Improving code speed** - Index(es):