Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to perfom very simple manipulations in large data sets more efficiently |

Date |
Fri, 12 Aug 2011 10:51:23 -0400 |

These procedures look pretty reasonable, but for extremely large data processing, you indeed might want to have a different solution. Sorting is O( _N log(_N) ) operation, and if you can find an O(_N) operation (which should be possible), that would help you. What exactly do you do with `my_value' afterwards? And how exactly do you organize your work flow with your 10K data sets? On Fri, Aug 12, 2011 at 10:43 AM, Tiago V. Pereira <tiago.pereira@mbe.bio.br> wrote: > Dear statalisters, > > I have to perform extremely simple tasks, but I am struggling with the low > efficiency of my dummy implementations. Perhaps you might have smarter > ideas. > > Here is an example: > > Suppose I have two variables, X and Y. > > I need to the get value of Y that is associated with the smallest value of X. > > What I usually do is: > > (1) simple approach 1 > > */ ------ start -------- > sum X, meanonly > keep if X==r(min) > local my_value = Y[1] > */ ------ end -------- > > (2) simple approach 2 > > */ ------ start -------- > sort X > local my_value = Y[1] > */ ------ end -------- > > These approaches are simple, and work very well for small data sets. Now, > I have to repeat that procedure 10k times, for data sets that range from > 500k to 1000k observations. Hence, both procedures 1 and 2 become clearly > slow. > > If you have any tips, I will be very grateful. > > All the best, > > Tiago > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: How to perfom very simple manipulations in large data sets more efficiently***From:*"Tiago V. Pereira" <tiago.pereira@mbe.bio.br>

- Prev by Date:
**st: How to perfom very simple manipulations in large data sets more efficiently** - Next by Date:
**RE: st: Repeated measured analysis** - Previous by thread:
**st: How to perfom very simple manipulations in large data sets more efficiently** - Next by thread:
**st: RE: How to perfom very simple manipulations in large data sets more efficiently** - Index(es):