Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Improving code speed


From   Luis Aguiar <stataluis@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Improving code speed
Date   Thu, 23 May 2013 10:01:09 +0200

Hi George and Nick,

Thanks a lot for your responses.

Nick: Yes, the code works but there are indeed a couple of mistakes in
the one I copied. I actually use the command " mkmat epsilon if
id<=`reps' " after generating the variable epsilon (id is indeed _n in
a variable). The second line in your comment (b) should read " replace
x`i'=z + epsilon[`i',1] if id==`k'  " . Sorry about all that. Thanks
for your helpful comments though, I will try to incorporate them into
my code along with George's comments in order to speed things up.

George: Thanks for your suggestions as well. I wasn't sure if it would
be worth going into mata, but I will try it now. Your parallel code
seems very interesting too. Do you think it would go faster than using
mata?

Again, thanks a lot to both of you!

Cheers,
Luis

2013/5/22 Nick Cox <njcoxstata@gmail.com>:
> Please use your full real name. See Statalist FAQ for that request and why.
>
> Some speed-ups are likely to be possible here, but I first I note
> several puzzles with this code.
>
> You don't say so, but presumably -id- is _n in a variable.
>
> -z- is unexplained.
>
> More problematically,
>
> (a) You -generate- a variable -epsilon- but you refer to a matrix -epsilon-.
>
> (b) The two lines below won't work as once -x`i'- exists the second
> command will fail.
>
>                 gen x`i'=z \\ generate simulated variable
>                 gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part
>
> Why do you say that it works? Did you copy some buggy version by accident?
>
> Better to post self-contained code that works.
>
> Speed-ups, apart from using Mata. (See George Vega Yon's post.)
>
> 1. Too much copying from one variable to another. I could be wrong,
> but some variables appear to be mostly zero, and you are just copying
> constants. Think in terms of scalars instead.
>
> 2. Use -summarize, meanonly- to get sums. -egen- is very slow at this.
>
> 3. Use -in 1/`k'- or -in `k'- wherever possible. Whenever there is a
> choice between -if- and -in- for the same problem, -in- is faster.
>
> Some example code:
>
>   gen x`i'=z
>   replace x`i'= x`i' + epsilon[`i'] in `k'
>
>   su x`i' in 1/`k', meanonly
>   replace Y`k' = (x`i')^2/r(sum) in `k'
>
>
> Nick
> njcoxstata@gmail.com
>
>
> On 22 May 2013 18:25, Luis <stataluis@gmail.com> wrote:
>> Dear statalist  users,
>>
>> I am running into a "loop efficiency problem" in that I have to
>> construct a variable using many iterations and I am not sure whether I
>> am being as efficient as possible. Given the number of observations
>> that I have and with my current code, I have to wait days for my code
>> to finish running! Here's my problem:
>>
>> I have a total of 50000 observations and need to construct a variable
>> Y that will be computed using different subsamples of these
>> observations. In particular,
>> Y=Y1 when the subsample contains only the first observation,
>> Y=Y2 when the subsample contains observations 1 and 2,
>> Y=Y3 when the subsample contains observations 1, 2 and 3 etc until
>> Y=Y50000.
>>
>> The idea is therefore to loop over the sample and define the subsample
>> which contains observations 1 until k and construct the variable
>> Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k'
>> after each loop to end up with the final variable Y.
>>
>> To further complicate things, the variable Y needs to be the average
>> of 100 simulations that depend on draws taken from a normal
>> distribution. Hence I need to do a loop within the initial loop in
>> order to do the 100 simulations.
>>
>> My code therefore looks like this:
>>
>> _____________________________________________________________________________________
>>
>> gen Y=0
>>
>> local reps=100 \\ define the number of simulations
>>
>> gen epsilon=rnormal() \\ generate the random var for the simulations
>>
>> forvalues k=1(1)50000{
>>
>> gen subs=(id<=`k')   \\ Define the subsample to be used
>> gen Y`k'=0      \\ gen the intermediate Y`k'
>>
>>         forvalues i=1(1)`reps'{
>>
>>                 gen x`i'=z \\ generate simulated variable
>>                 gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part
>>
>>         gen t=(x`i')^2
>>         bysort subs: egen tsum=sum(x`i')
>>
>>         gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i
>>         replace Y_`i'=0 if id!=`k'
>>
>>         replace Y`k'=Y`k' + Y_`i'
>>                 replace Y`k'=0 if id!=`k'
>>
>>         drop Y_`i' t tsum x`i'
>>         }
>>
>> replace Y`k'=Y`k'/`reps'      // average Y from the 100 simulations
>> replace Y= Y + Y`k'
>> drop Y`k' subs
>>         }
>>
>> ____________________________________________________________________________________
>>
>>
>> The code runs fine, but I takes a lot of time since it has to
>> construct 100 variables for each of the 50000 iterations. I have tried
>> many different possibilities and I can't think of another way of
>> constructing Y.
>>
>> Any tip or suggestion that would help improve the efficiency of my
>> code would be greatly appreciated!!!
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index