Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: -finddup- for panel?


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: -finddup- for panel?
Date   Wed, 21 Apr 2004 13:42:16 +0100

Fred Wolfe
> 
> As the author of -finddup-, I have to agree with Nick that 
> -duplicates- has 
> more bells and whistles and seems to do everything that 
> -finddup- does.
> 
> Joe J., however, finds that -finddup- is useful  when one has 
> to decide 
> over which among the duplicates to include and which to 
> exclude. I agree. I 
> haven't used -duplicates- much and I may be mistaken about 
> its capabilities 
> in tagging duplicates. I believe that -duplicates- tags the 
> duplicated 
> observation with a number that represent the number of duplicates. 

Correct. 

> -finddup- tags the duplicates with a sequential number based 
> on a sorted 
> list such that if there are 3 duplicates they will be 
> numbered 1,2,3 (for 
> example).
> 
> I find that feature to be very useful. In situations where there are 
> duplicated keys but not duplicated observations, one may need 
> to decide 
> which of the duplicates to retain or to keep. Being able to 
> tag them with a 
> sequential number facilitates that task. For example, -drop if 
> inrange(dupval, 2,99)-
> 
> Here are some examples. We survey people with arthritis. 
> Inexplicably, some 
> persons complete 2 surveys (!) and are assigned duplicate 
> keys for the 
> major data set keys. The question arises, which observation should be 
> deleted (retained) as they are not true duplicates. One might 
> want to make 
> a rule to delete the first observation or the second, or 
> might want to look 
> at the data before making such a choice. For me, -finddup- is 
> little easier 
> to use in that circumstance. Nick will correct me if I have misread 
> -duplicates. Perhaps sequential numbering of the duplicates 
> could be added 
> to -duplicates-

That is an interesting idea for StataCorp to consider. 
Sequential tagging can, however, be done in at least two 
ways, starting at 0 and starting at 1. Fred likes 1; 
others might prefer 0. 

> -finddup- also does an un-Stata thing. it automatically 
> creates a variable 
> called -dupval-. -duplicates- forces you to name the new 
> variable. I like 
> -dupval- because i always remember its name, sort of like 
> -_merge- that 
> Stata creates automatically.

Indeed. Although there are Stata commands which use 
special names, that behaviour is not indulged in 
without a strong case. 

Nick 
n.j.cox@durham.ac.uk 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index