Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to perfom very simple manipulations in large data sets more efficiently


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to perfom very simple manipulations in large data sets more efficiently
Date   Fri, 12 Aug 2011 10:51:23 -0400

These procedures look pretty reasonable, but for extremely large data
processing, you indeed might want to have a different solution.
Sorting is O( _N log(_N) ) operation, and if you can find an O(_N)
operation (which should be possible), that would help you. What
exactly do you do with `my_value' afterwards? And how exactly do you
organize your work flow with your 10K data sets?

On Fri, Aug 12, 2011 at 10:43 AM, Tiago V. Pereira
<tiago.pereira@mbe.bio.br> wrote:
> Dear statalisters,
>
> I have to perform extremely simple tasks, but I am struggling with the low
> efficiency of my dummy implementations. Perhaps you might have smarter
> ideas.
>
> Here is an example:
>
> Suppose I have two variables, X and Y.
>
> I need to the get value of Y that is associated with the smallest value of X.
>
> What I usually do is:
>
> (1) simple approach 1
>
> */ ------ start --------
> sum X, meanonly
> keep if X==r(min)
> local my_value = Y[1]
> */ ------ end --------
>
> (2) simple approach 2
>
> */ ------ start --------
> sort X
> local my_value = Y[1]
> */ ------ end --------
>
> These approaches are simple, and work very well for small data sets. Now,
> I have to repeat that procedure 10k times, for data sets that range from
> 500k to 1000k observations. Hence, both procedures 1 and 2 become clearly
> slow.
>
> If you have any tips, I will be very grateful.
>
> All the best,
>
> Tiago
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index