Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to perfom very simple manipulations in large data sets more efficiently
From
Stas Kolenikov <[email protected]>
To
[email protected]
Subject
Re: st: How to perfom very simple manipulations in large data sets more efficiently
Date
Fri, 12 Aug 2011 10:51:23 -0400
These procedures look pretty reasonable, but for extremely large data
processing, you indeed might want to have a different solution.
Sorting is O( _N log(_N) ) operation, and if you can find an O(_N)
operation (which should be possible), that would help you. What
exactly do you do with `my_value' afterwards? And how exactly do you
organize your work flow with your 10K data sets?
On Fri, Aug 12, 2011 at 10:43 AM, Tiago V. Pereira
<[email protected]> wrote:
> Dear statalisters,
>
> I have to perform extremely simple tasks, but I am struggling with the low
> efficiency of my dummy implementations. Perhaps you might have smarter
> ideas.
>
> Here is an example:
>
> Suppose I have two variables, X and Y.
>
> I need to the get value of Y that is associated with the smallest value of X.
>
> What I usually do is:
>
> (1) simple approach 1
>
> */ ------ start --------
> sum X, meanonly
> keep if X==r(min)
> local my_value = Y[1]
> */ ------ end --------
>
> (2) simple approach 2
>
> */ ------ start --------
> sort X
> local my_value = Y[1]
> */ ------ end --------
>
> These approaches are simple, and work very well for small data sets. Now,
> I have to repeat that procedure 10k times, for data sets that range from
> 500k to 1000k observations. Hence, both procedures 1 and 2 become clearly
> slow.
>
> If you have any tips, I will be very grateful.
>
> All the best,
>
> Tiago
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/