Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Luis Aguiar <stataluis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Improving code speed |

Date |
Thu, 23 May 2013 10:01:09 +0200 |

Hi George and Nick, Thanks a lot for your responses. Nick: Yes, the code works but there are indeed a couple of mistakes in the one I copied. I actually use the command " mkmat epsilon if id<=`reps' " after generating the variable epsilon (id is indeed _n in a variable). The second line in your comment (b) should read " replace x`i'=z + epsilon[`i',1] if id==`k' " . Sorry about all that. Thanks for your helpful comments though, I will try to incorporate them into my code along with George's comments in order to speed things up. George: Thanks for your suggestions as well. I wasn't sure if it would be worth going into mata, but I will try it now. Your parallel code seems very interesting too. Do you think it would go faster than using mata? Again, thanks a lot to both of you! Cheers, Luis 2013/5/22 Nick Cox <njcoxstata@gmail.com>: > Please use your full real name. See Statalist FAQ for that request and why. > > Some speed-ups are likely to be possible here, but I first I note > several puzzles with this code. > > You don't say so, but presumably -id- is _n in a variable. > > -z- is unexplained. > > More problematically, > > (a) You -generate- a variable -epsilon- but you refer to a matrix -epsilon-. > > (b) The two lines below won't work as once -x`i'- exists the second > command will fail. > > gen x`i'=z \\ generate simulated variable > gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part > > Why do you say that it works? Did you copy some buggy version by accident? > > Better to post self-contained code that works. > > Speed-ups, apart from using Mata. (See George Vega Yon's post.) > > 1. Too much copying from one variable to another. I could be wrong, > but some variables appear to be mostly zero, and you are just copying > constants. Think in terms of scalars instead. > > 2. Use -summarize, meanonly- to get sums. -egen- is very slow at this. > > 3. Use -in 1/`k'- or -in `k'- wherever possible. Whenever there is a > choice between -if- and -in- for the same problem, -in- is faster. > > Some example code: > > gen x`i'=z > replace x`i'= x`i' + epsilon[`i'] in `k' > > su x`i' in 1/`k', meanonly > replace Y`k' = (x`i')^2/r(sum) in `k' > > > Nick > njcoxstata@gmail.com > > > On 22 May 2013 18:25, Luis <stataluis@gmail.com> wrote: >> Dear statalist users, >> >> I am running into a "loop efficiency problem" in that I have to >> construct a variable using many iterations and I am not sure whether I >> am being as efficient as possible. Given the number of observations >> that I have and with my current code, I have to wait days for my code >> to finish running! Here's my problem: >> >> I have a total of 50000 observations and need to construct a variable >> Y that will be computed using different subsamples of these >> observations. In particular, >> Y=Y1 when the subsample contains only the first observation, >> Y=Y2 when the subsample contains observations 1 and 2, >> Y=Y3 when the subsample contains observations 1, 2 and 3 etc until >> Y=Y50000. >> >> The idea is therefore to loop over the sample and define the subsample >> which contains observations 1 until k and construct the variable >> Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k' >> after each loop to end up with the final variable Y. >> >> To further complicate things, the variable Y needs to be the average >> of 100 simulations that depend on draws taken from a normal >> distribution. Hence I need to do a loop within the initial loop in >> order to do the 100 simulations. >> >> My code therefore looks like this: >> >> _____________________________________________________________________________________ >> >> gen Y=0 >> >> local reps=100 \\ define the number of simulations >> >> gen epsilon=rnormal() \\ generate the random var for the simulations >> >> forvalues k=1(1)50000{ >> >> gen subs=(id<=`k') \\ Define the subsample to be used >> gen Y`k'=0 \\ gen the intermediate Y`k' >> >> forvalues i=1(1)`reps'{ >> >> gen x`i'=z \\ generate simulated variable >> gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part >> >> gen t=(x`i')^2 >> bysort subs: egen tsum=sum(x`i') >> >> gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i >> replace Y_`i'=0 if id!=`k' >> >> replace Y`k'=Y`k' + Y_`i' >> replace Y`k'=0 if id!=`k' >> >> drop Y_`i' t tsum x`i' >> } >> >> replace Y`k'=Y`k'/`reps' // average Y from the 100 simulations >> replace Y= Y + Y`k' >> drop Y`k' subs >> } >> >> ____________________________________________________________________________________ >> >> >> The code runs fine, but I takes a lot of time since it has to >> construct 100 variables for each of the 50000 iterations. I have tried >> many different possibilities and I can't think of another way of >> constructing Y. >> >> Any tip or suggestion that would help improve the efficiency of my >> code would be greatly appreciated!!! > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Improving code speed***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Improving code speed***From:*Luis <stataluis@gmail.com>

**Re: st: Improving code speed***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Re: svmat is changing numbers - a rounding problem?** - Next by Date:
**Re: st: Improving code speed** - Previous by thread:
**Re: st: Improving code speed** - Next by thread:
**Re: st: Improving code speed** - Index(es):