Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to perfom very simple manipulations in large data sets more efficiently

From	Stas Kolenikov <[email protected]>
To	[email protected]
Subject	Re: st: How to perfom very simple manipulations in large data sets more efficiently
Date	Fri, 12 Aug 2011 10:51:23 -0400

These procedures look pretty reasonable, but for extremely large data
processing, you indeed might want to have a different solution.
Sorting is O( _N log(_N) ) operation, and if you can find an O(_N)
operation (which should be possible), that would help you. What
exactly do you do with `my_value' afterwards? And how exactly do you
organize your work flow with your 10K data sets?

On Fri, Aug 12, 2011 at 10:43 AM, Tiago V. Pereira
<[email protected]> wrote:
> Dear statalisters,
>
> I have to perform extremely simple tasks, but I am struggling with the low
> efficiency of my dummy implementations. Perhaps you might have smarter
> ideas.
>
> Here is an example:
>
> Suppose I have two variables, X and Y.
>
> I need to the get value of Y that is associated with the smallest value of X.
>
> What I usually do is:
>
> (1) simple approach 1
>
> */ ------ start --------
> sum X, meanonly
> keep if X==r(min)
> local my_value = Y[1]
> */ ------ end --------
>
> (2) simple approach 2
>
> */ ------ start --------
> sort X
> local my_value = Y[1]
> */ ------ end --------
>
> These approaches are simple, and work very well for small data sets. Now,
> I have to repeat that procedure 10k times, for data sets that range from
> 500k to 1000k observations. Hence, both procedures 1 and 2 become clearly
> slow.
>
> If you have any tips, I will be very grateful.
>
> All the best,
>
> Tiago
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: How to perfom very simple manipulations in large data sets more efficiently
  - From: "Tiago V. Pereira" <[email protected]>

Prev by Date: st: How to perfom very simple manipulations in large data sets more efficiently
Next by Date: RE: st: Repeated measured analysis
Previous by thread: st: How to perfom very simple manipulations in large data sets more efficiently
Next by thread: st: RE: How to perfom very simple manipulations in large data sets more efficiently
Index(es):
- Date
- Thread