Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Merge issues - m:m not returning all matches


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Merge issues - m:m not returning all matches
Date   Fri, 20 Jan 2012 15:35:40 +0000

On m:m merges: see the thread last week starting with

http://www.stata.com/statalist/archive/2012-01/msg00370.html

However, please ignore my post in that thread: it missed the point, which is well explained by others. 

Nick 
n.j.cox@durham.ac.uk 


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Aaron Legler
Sent: 20 January 2012 15:25
To: statalist@hsphsun2.harvard.edu
Subject: st: Merge issues - m:m not returning all matches

I am having an issue with merge -

I have one dataset with patient_id and censustract, and another file with
censustract and distance to 16 locations

When I perform the merge I am not getting all the possible matches:

This is the original patient with 2 records

    patiennum         geoid    svc_date
       12345   25009205500   01 Aug 09
       12345   25009205500   05 Sep 10

after the merge:  merge m:m geoid using chc.censustract.dist.dta

I should get 32 records (2 patient records x 16 locatons) but I'm only
getting 16:

    patien~m         geoid    svc_date   km_to_~c   hosp        _merge
       12345   25009205500   01 Aug 09     13.701      2   matched (3)
       12345   25009205500   05 Sep 10     15.144      1   matched (3)
       12345   25009205500   05 Sep 10     15.144      5   matched (3)
       12345   25009205500   05 Sep 10     15.144     13   matched (3)
       12345   25009205500   05 Sep 10     15.144     14   matched (3)
       12345   25009205500   05 Sep 10     19.156     12   matched (3)
       12345   25009205500   05 Sep 10     19.156     16   matched (3)
       12345   25009205500   05 Sep 10     20.407      3   matched (3)
       12345   25009205500   05 Sep 10     20.407      4   matched (3)
       12345   25009205500   05 Sep 10     20.407      6   matched (3)
       12345   25009205500   05 Sep 10     20.407      8   matched (3)
       12345   25009205500   05 Sep 10     20.407     11   matched (3)
       12345   25009205500   05 Sep 10     20.407     15   matched (3)
       12345   25009205500   05 Sep 10     25.031      9   matched (3)
       12345   25009205500   05 Sep 10     25.038      7   matched (3)
       12345   25009205500   05 Sep 10     25.583     10   matched (3)

It seems like the system isn't recognizing the differences in svc_date and
just running 1 match.

I checked to ensure the geoids are the same:

. tab geoid
      geoid |      Freq.     Percent        Cum.
------------+-----------------------------------
   2.50e+10 |         16      100.00      100.00
------------+-----------------------------------
      Total |         16      100.00

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index