As the author of -finddup-, I have to agree with Nick that -duplicates- has more bells and whistles and seems to do everything that -finddup- does.

Joe J., however, finds that -finddup- is useful when one has to decide over which among the duplicates to include and which to exclude. I agree. I haven't used -duplicates- much and I may be mistaken about its capabilities in tagging duplicates. I believe that -duplicates- tags the duplicated observation with a number that represent the number of duplicates. -finddup- tags the duplicates with a sequential number based on a sorted list such that if there are 3 duplicates they will be numbered 1,2,3 (for example).

I find that feature to be very useful. In situations where there are duplicated keys but not duplicated observations, one may need to decide which of the duplicates to retain or to keep. Being able to tag them with a sequential number facilitates that task. For example, -drop if inrange(dupval, 2,99)-

Here are some examples. We survey people with arthritis. Inexplicably, some persons complete 2 surveys (!) and are assigned duplicate keys for the major data set keys. The question arises, which observation should be deleted (retained) as they are not true duplicates. One might want to make a rule to delete the first observation or the second, or might want to look at the data before making such a choice. For me, -finddup- is little easier to use in that circumstance. Nick will correct me if I have misread -duplicates. Perhaps sequential numbering of the duplicates could be added to -duplicates-

-finddup- also does an un-Stata thing. it automatically creates a variable called -dupval-. -duplicates- forces you to name the new variable. I like -dupval- because i always remember its name, sort of like -_merge- that Stata creates automatically.


