Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: testing -duplicates tag-


From   "Eva Poen" <eva.poen@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: testing -duplicates tag-
Date   Wed, 3 Sep 2008 18:59:40 +0100

Michael,

I'm not quite sure I understand what your definition of duplicates is.
If you take a look at this:

. sysuse auto
(1978 Automobile Data)

. duplicates list head trunk if trunk < 10

Duplicates in terms of headroom trunk

  +----------------------------------+
  | group:   obs:   headroom   trunk |
  |----------------------------------|
  |      1     18        2.0       7 |
  |      1     52        2.0       7 |
  |      2     20        2.0       8 |
  |      2     57        2.0       8 |
  |      3     58        2.5       8 |
  |----------------------------------|
  |      3     59        2.5       8 |
  |      4     29        3.0       9 |
  |      4     68        3.0       9 |
  +----------------------------------+

(The -if- qualifier was only added to shorten the list.)
There are 4 combinations of headroom and trunk for hich more than one
observation exists with that combination. Group one, for example,
consists of observations 18 and 52, which both have headroom equal to
2 and trunk equal to 7. In Stata terminology, there is one duplicate
in terms of trunk and headroom for group one.

Your notion of "simultaneously" is implemented in Stata by default; as
soon as you type - duplicates list x y-, or -duplicates tag x y-,
Stata looks for observations that have the same value of x _and_ the
same value of y. Observations are allowed to differ in any of the
other variables in the dataset.

Thus, to achieve what you want you could simply do -duplicates list
head trunk-, or -duplicates tag head trunk, gen(tag)- and then -list
head trunk tag if tag > 0-. There is no need to search manually.
Generally, after you created your tag variable, you can simply type
-... if tag == 1- to find all observations which have one duplicate,
or - ... if tag == 2- for those combinations for which two duplicates
exist, etc.

Hope this helps,
Eva

2008/9/3 Michael McCulloch <mm@pinest.org>:
> Apologies, I wasn't clear in my question. What I want to do is find records
> for which *both* trunk and headroom are duplicates. So following the command
> suggested by Martin and Nick, I get:
>
>
> . list foreign headroom trunk if trunk==8, clean
>
>        foreign   headroom   trunk  20.   Domestic        2.0       8  45.
> Domestic        1.5       8  57.    Foreign        2.0       8  58.
>  Foreign        2.5       8  59.    Foreign        2.5       8
> Note that:
>        observations 20 and 57 both have headroom==2.0, trunk==8
>        observations 58 and 59 both have headroom==2.5, trunk==8
>
> Since I'm developing this command for use in a large dataset, how would I
> follow up -duplicates tag- to identify those unique sets of records, where
> two variables are duplicates simultaneously, without having to search
> manually?
>
>
>
>> I cannot see your point. Stata does tag these observations with tag 1.
>> Just
>> -list- after -duplicates tag-.
>>
>> **********
>> clear
>> sysuse auto
>> list foreign headroom trunk if trunk==8
>> duplicates tag headroom trunk, generate(dup_admission_id)
>> *Let`s see...
>> list dup_* foreign headroom trunk if trunk==8
>> **********
>>
>> HTH
>> Martin
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael
>> McCulloch
>> Sent: Wednesday, September 03, 2008 6:29 PM
>> To: Statalist
>> Subject: st: testing -duplicates tag-
>>
>> Hello,
>> I'm testing -duplicates tag-, and puzzled as to why it won't show the
>> two observations where headroom==2.0 and trunk==8.
>>
>> clear
>> sysuse auto
>> list foreign headroom trunk if trunk==8
>> duplicates tag headroom trunk, generate(dup_admission_id)
>>
>> --
>>
>> Best wishes,
>> Michael McCulloch
>>
>>
>>
>> Pine Street Foundation
>> 124 Pine St., San Anselmo, CA 94960-2674
>> Tel:    (415) 407-1357
>> Fax:    (415) 485-1065
>> mcculloch@pinestreetfoundation.org
>> www.pinestreetfoundation.org
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index