From
"Tiago V. Pereira" <tiago.pereira@mbe.bio.br>

To
statalist@hsphsun2.harvard.edu

Subject
st: How to perfom very simple manipulations in large data sets more efficiently

Date
Fri, 12 Aug 2011 11:43:23 -0300 (BRT)

Dear statalisters, I have to perform extremely simple tasks, but I am struggling with the low efficiency of my dummy implementations. Perhaps you might have smarter ideas. Here is an example: Suppose I have two variables, X and Y. I need to the get value of Y that is associated with the smallest value of X. What I usually do is: (1) simple approach 1 */ ------ start -------- sum X, meanonly keep if X==r(min) local my_value = Y[1] */ ------ end -------- (2) simple approach 2 */ ------ start -------- sort X local my_value = Y[1] */ ------ end -------- These approaches are simple, and work very well for small data sets. Now, I have to repeat that procedure 10k times, for data sets that range from 500k to 1000k observations. Hence, both procedures 1 and 2 become clearly slow. If you have any tips, I will be very grateful. All the best, Tiago * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

