Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: testing -duplicates tag-


From   Michael McCulloch <mm@pinest.org>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: testing -duplicates tag-
Date   Wed, 3 Sep 2008 21:29:59 -0700

The code suggested by Martin gets me closer, but the pattern is still not exclusive. I'm trying to identify observations in DOMESTIC, which are duplicates (in headroom & trunk) of observations in FOREIGN. Here are two sets of those duplicates. Note how 20 is a duplicate of 57, where the patterns of missing and 0 in dupfor and dupdom seem to form a pattern; that pattern, however is contradicted in the next set, where 53 71 and 72 are duplicates of 32.

Any ideas would be appreciated!

id foreign headroom trunk dupall dupfor dupdom
20 Domestic 2 8 1 . 0
57 Foreign 2 8 1 0 .
* * * * * * *
32 Domestic 3 15 3 . 0
53 Foreign 3 15 3 2 .
71 Foreign 3 15 3 2 .
72 Foreign 3 15 3 2 .





Try this:

sysuse auto, clear
duplicates tag headroom trunk if foreign==1, generate(dupfor)
*duplicates tag headroom trunk if foreign==0, generate(dupdom)
duplicates tag headroom trunk, generate(dupall)
l if dupfor==0 & dupall>0


HTH
Martin


Quoting Michael McCulloch <mm@pinest.org>:

On other question, if I may:
How would I modify the list command as re-written below, to identify
only those duplicates where:
	headroom and trunks are duplicated, but
	foreign is not,
so that I could find only those Foreign cars who have duplicates in the
set of Domestic cars (in this case observations #7 and #8)?

clear
sysuse auto
list foreign headroom trunk
duplicates tag headroom trunk, generate(dup)
sort headroom trunk
list foreign headroom trunk dup if dup>0 & trunk==8, clean noobs




Well, as -help duplicates- shows, a -varlist- is allowed with all of the fice commands. If you had the *OR* operator, this would be pointless. -duplicates tag- watches out for unique combinations of the variables in your -varlist- and then tags with the number of other observations sharing this unique combination.

sysuse auto, clear
duplicates tag head mpg, gen(dup)
duplicates report headroom mpg
ta dup

duplicates tag head mpg tru, gen(dup1)
duplicates report headroom mpg tru
ta dup1


HTH
Martin

Quoting Michael McCulloch <mm@pinest.org>:


Thanks Martin. Am I correct in understanding that, in this revised
example immediately below, the command:

. duplicates tag headroom trunk, generate(dup)

would tag as dup>0 all sets of observations for which there are duplicates of:
headroom *AND* trunk
and not just those for which there are duplicates of:
headroom *OR* trunk
?
It looks that way on visual inspection of this example's output, but I
want to make sure before applying it to my much larger dataset.


clear
sysuse auto
list foreign headroom trunk
duplicates tag headroom trunk, generate(dup)
sort headroom trunk
list foreign headroom trunk dup if dup>0, clean

Michael


Well, the question is not much clearer now, at least to me. I suspect you want something like

count if duptag > 0

after your commands. Just replace duptag with the tag used by Stata and be aware that two observations sharing the same covariate pattern would each be counted twice (58 and 59 would both count under this rule). If that is not what you want, clarify!


HTH
Martin

Quoting Michael McCulloch <mm@pinest.org>:


Apologies, I wasn't clear in my question. What I want to do is find
records for which *both* trunk and headroom are duplicates. So
following the command suggested by Martin and Nick, I get:


. list foreign headroom trunk if trunk==8, clean

      foreign   headroom   trunk  20.   Domestic        2.0       8
45.   Domestic        1.5       8  57.    Foreign        2.0       8
58.    Foreign        2.5       8  59.    Foreign        2.5       8
Note that:
	observations 20 and 57 both have headroom==2.0, trunk==8
	observations 58 and 59 both have headroom==2.5, trunk==8

Since I'm developing this command for use in a large dataset, how would
I follow up -duplicates tag- to identify those unique sets of records,
where two variables are duplicates simultaneously, without having to
search manually?

I cannot see your point. Stata does tag these observations with tag 1. Just
-list- after -duplicates tag-.

**********
clear
sysuse auto
list foreign headroom trunk if trunk==8
duplicates tag headroom trunk, generate(dup_admission_id)
*Let`s see...
list dup_* foreign headroom trunk if trunk==8
**********

HTH
Martin

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael McCulloch
Sent: Wednesday, September 03, 2008 6:29 PM
To: Statalist
Subject: st: testing -duplicates tag-

Hello,
I'm testing -duplicates tag-, and puzzled as to why it won't show the
two observations where headroom==2.0 and trunk==8.

clear
sysuse auto
list foreign headroom trunk if trunk==8
duplicates tag headroom trunk, generate(dup_admission_id)

--

Best wishes,
Michael McCulloch



Pine Street Foundation
124 Pine St., San Anselmo, CA 94960-2674
Tel: (415) 407-1357
Fax: (415) 485-1065
mcculloch@pinestreetfoundation.org
www.pinestreetfoundation.org
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index