Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: randomly drop duplicates |
Date | Fri, 31 May 2013 16:54:58 +0100 |
You are correct that -duplicates drop- does not include anything like this. It's an official command but I can comment on the thinking behind it. Certainly nothing like that was considered when -duplicates- was first written. In retrospect I think that was right. The main intent behind -duplicates- is essentially that you can identify and if necessary -drop- redundant data. -duplicates- offers you a way to decide that you don't care about dropping some kinds of information through its -force- option. Note among other details that what that does is well-defined and reproducible. What Ann wants is, to my mind, more like subsampling. There are easy ways to do that and subject to documenting a random number seed it is reproducible. Nevertheless -duplicates- is already a moderately complicated command and it seems best that its purpose not be muddied by considerably broadening its scope. This is no sense critical of what Ann wants to do. I just wanted to comment briefly on the logic of -duplicates-. Nick njcoxstata@gmail.com On 31 May 2013 15:35, Ann Montgomery <ann.montgomery@mail.utoronto.ca> wrote: > I'd like to drop duplicates randomly instead of dropping the first duplicate row. I can't find reference to this in -duplicates drop-? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/