Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Clyde B Schechter <clyde.schechter@einstein.yu.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: Re: st: creating random groups of observations |

Date |
Thu, 6 Dec 2012 19:24:20 +0000 |

Luca Campanelli wants to randomly assort 4000 words into 1000 groups with 4 words each, and he wants to assure that each group has a satisfactory mix of long and short words. He doesn't specify exactly what criterion defines a satisfactory mix, so it is hard to be concrete. But here are a few thoughts. First, depending on the frequency distribution of long and short words (and even what is meant by long and short in this context), it may not even be posssible. For example, if there are only 100 "short" words in the data set, then clearly the goal cannot be achieved. Assuming that long and short words are all prevalent in sufficient numbers then creating 1000 groups of 2 long words and 1000 groups of 2 short words, then combining each long word group with its correspondingly numbered short word group might do it, again depending on exactly what you have in mind. If Luca has in mind some more complex criterion such as constraints on the mean and variance of the number of characters in each group's words, that is something I would not try to accomplish in Stata. It could be done in C++ or a similar programming language using a branch-and-bound algorithm. But expect it to take a long time to run even on a fast machine: you are trying to tame a combinatorial explosion by imposing a few constraints. And, again, be prepared for the possibility that the actual distribution of word lengths precludes the existence of any solution at all--which you would only find out after a very long time. Best of luck. Clyde Schechter Dept. of Family & Social Medicine Albert Einstein College of Medicine Bronx, NY, USA * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Directory Management on a Mac** - Next by Date:
**Re: st: Too many macros, but I create 1!** - Previous by thread:
**Re: st: creating random groups of observations** - Next by thread:
**Re: Re: st: creating random groups of observations** - Index(es):