Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: matching within groups


From   Svend Juul <SJ@SOCI.AU.DK>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: matching within groups
Date   Thu, 20 Mar 2008 16:48:35 +0100

Jeph wrote:
 
Though I see a long way to solve this, I'm hoping
someone out there sees a more efficient solution.
 
My data consist of different reports of hospital
admissions - one from the medical record (mr) and
one from the patient (pat).  I have an admission
date (admdt) & hospital code (hospcd) for each
admission. Here -id- identifies patients, and I've
added some separators manually for clarity:
 
id     mr_admdt   mr_hospcd    pt_admdt   pt_hospcd
---------------------------------------------------
1       1 Jan 07     35         6 Sep 06    35
1       6 Sep 06     35         1 Feb 07    36
1              .      .        23 Jun 06    35
---------------------------------------------------
2      11 Oct 07     34        21 Dec 06    34
2      21 Dec 06     34                .     .
---------------------------------------------------
3       1 Jan 07     33         1 Jan 07    33
---------------------------------------------------
...
 
The problem is to determine discrepancies. For instance,
patient [3] has no problems - the two sources identify
the same admissions (same date and place). However patient
[1] has one match (both report 6 sep 06 at hosp #35) and
then three discrepancies.
 
Ideally I would end up with:
 
id   mr_admdt  mr_hospcd  mr_dis pt_admdt pt_hospcd pat_dis
-----------------------------------------------------------
1    1 Jan 07     35         1    6 Sep 06    35      0
1    6 Sep 06     35         0    1 Feb 07    36      1
1                            .   23 Jun 06    35      1
-----------------------------------------------------------
2   11 Oct 07     34         1   21 Dec 06    34      0
2   21 Dec 06     34         0                        .
-----------------------------------------------------------
3    1 Jan 07     33         0    1 Jan 07    33      0
-----------------------------------------------------------
...
 
My solution would be to create two datasets and merge them
on (hospdt hospcd), using _merge to identify which
records don't match. However, I it gets messy trying to
get the results back into the original file.
 
Is there any command or routine out there that will let
me match up two variables (I can always group the date
& code variables into one) without disturbing the original
data? I don't mind sorting the admissions with patid
(the current order is the arbitrary order in which they
are reported).

=========================================================
 
I don't quite understand what you want to end up with. I think
I would want something like:
 
     +-------------------------------------------------------+
     | id   hospcd      hospdt   patdata   hospdata   _merge |
     |-------------------------------------------------------|
  1. |  1       35   23jun2006         1          .        1 |
  2. |  1       35   06sep2006         1          1        3 |
  3. |  1       35   01jan2007         .          1        2 |
     |-------------------------------------------------------|
  4. |  2       34   21dec2006         1          1        3 |
  5. |  2       34   11oct2007         .          1        2 |
     |-------------------------------------------------------|
  6. |  3       33   01jan2007         1          1        3 |
     +-------------------------------------------------------+
 
I obtained it with this do-file:
 
      -----------------------------
      clear
      input id d m y hospcd
      1 1 1 2007 35
      1 6 9 2006 35
      2 11 10 2007 34
      2 21 12 2006 34
      3 1 1 2007 33
      end
      gen hospdt = mdy(m,d,y)
      format hospdt %td
      drop d m y
      generate hospdata=1
      sort id hospdt hospcd
      save hospdata.dta, replace
 
      clear
      input id d m y hospcd
      1 6 9 2006 35
      1 2 2007 36
      1 23 6 2006 35
      2 21 12 2006 34
      3 1 1 2007 33
      end
      gen hospdt = mdy(m,d,y)
      format hospdt %td
      drop d m y
      sort id hospdt hospcd
      generate patdata=1
      save patdata.dta, replace
 
      merge id hospdt hospcd using hospdata.dta
      sort id hospdt hospcd
      list, sepby(id)
      -----------------------------
 
Hope this helps
Svend

________________________________________________________ 
 
Svend Juul
Institut for Folkesundhed, Afdeling for Epidemiologi
(Institute of Public Health, Department of Epidemiology)
Vennelyst Boulevard 6 
DK-8000 Aarhus C,  Denmark 
Phone, work:   +45 8942 6090 
Phone, home:   +45 8693 7796 
Fax:           +45 8613 1580 
E-mail:        sj@soci.au.dk 
_________________________________________________________ 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index