Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: How to perfom very simple manipulations in large data sets more efficiently


From   "Tiago V. Pereira" <tiago.pereira@mbe.bio.br>
To   statalist@hsphsun2.harvard.edu
Subject   st: How to perfom very simple manipulations in large data sets more efficiently
Date   Fri, 12 Aug 2011 11:43:23 -0300 (BRT)

Dear statalisters,

I have to perform extremely simple tasks, but I am struggling with the low
efficiency of my dummy implementations. Perhaps you might have smarter
ideas.

Here is an example:

Suppose I have two variables, X and Y.

I need to the get value of Y that is associated with the smallest value of X.

What I usually do is:

(1) simple approach 1

*/ ------ start --------
sum X, meanonly
keep if X==r(min)
local my_value = Y[1]
*/ ------ end --------

(2) simple approach 2

*/ ------ start --------
sort X
local my_value = Y[1]
*/ ------ end --------

These approaches are simple, and work very well for small data sets. Now,
I have to repeat that procedure 10k times, for data sets that range from
500k to 1000k observations. Hence, both procedures 1 and 2 become clearly
slow.

If you have any tips, I will be very grateful.

All the best,

Tiago




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index