[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: testing -duplicates tag-

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: testing -duplicates tag-
Date   Thu, 4 Sep 2008 13:23:31 +0100

Michael's specific questions and various helpful answers from Martin and
others continue, but the general question here merits further comment. 

-duplicates- looks for observations that are duplicates on a varlist. If
you don't name a varlist, the varlist is all variables. If you do name a
varlist, it is naturally that. 
Duplicates means that all the variables concerned have identical values
for two or more observations. 

So given 

duplicates <whatever> headroom trunk

there is absolutely no question about it. -duplicates- does _not_ look
for duplicates on either -headroom- or -trunk-. It only looks for
duplicates on _both_ variables. 

If you want the OR interpretation, you have to run -duplicates-
separately and combine the results. 

Here is a sketch. 

gen byte isanydup = 0 

foreach v of var <varlist> { 
	duplicates tag `v', gen(work) 
	replace isanydup = isanydup | work 
	drop work 

Then look at -isanydup-. 

[email protected] 

Michael McCulloch

Thanks Martin. Am I correct in understanding that, in this revised 
example immediately below, the command:

	. duplicates tag headroom trunk, generate(dup)

would tag as dup>0 all sets of observations for which there are
duplicates of:
	headroom *AND* trunk
and not just those for which there are duplicates of:
	headroom *OR* trunk
It looks that way on visual inspection of this example's output, but 
I want to make sure before applying it to my much larger dataset.

sysuse auto
list foreign headroom trunk
duplicates tag headroom trunk, generate(dup)
sort headroom trunk
list foreign headroom trunk dup if dup>0, clean

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index