Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: matching within groups


From   Jeph Herrin <junk@spandrel.net>
To   statalist@hsphsun2.harvard.edu
Subject   st: matching within groups
Date   Thu, 20 Mar 2008 10:57:40 -0400

Though I see a long way to solve this, I'm hoping
someone out there sees a more efficient solution.

My data consist of different reports of hospital
admissions - one from the medical record (mr) and
one from the patient (pat).  I have an admission
date (admdt) & hospital code (hospcd) for each
admission. Here -id- identifies patients, and I've
added some separators manually for clarity:

id     mr_admdt   mr_hospcd    pt_admdt   pt_hospcd
---------------------------------------------------
1       1 Jan 07     35         6 Sep 06    35
1       6 Sep 06     35         1 Feb 07    36
1              .      .        23 Jun 06    35
---------------------------------------------------
2      11 Oct 07     34        21 Dec 06    34
2      21 Dec 06     34                .     .
---------------------------------------------------
3       1 Jan 07     33         1 Jan 07    33
---------------------------------------------------
...

The problem is to determine discrepancies. For instance,
patient [3] has no problems - the two sources identify
the same admissions (same date and place). However patient
[1] has one match (both report 6 sep 06 at hosp #35) and
then three discrepancies.

Ideally I would end up with:

id   mr_admdt  mr_hospcd  mr_dis pt_admdt pt_hospcd pat_dis
-----------------------------------------------------------
1    1 Jan 07     35         1    6 Sep 06    35      0
1    6 Sep 06     35         0    1 Feb 07    36      1
1                            .   23 Jun 06    35      1
-----------------------------------------------------------
2   11 Oct 07     34         1   21 Dec 06    34      0
2   21 Dec 06     34         0                        .
-----------------------------------------------------------
3    1 Jan 07     33         0    1 Jan 07    33      0
-----------------------------------------------------------
...

My solution would be to create two datasets and merge them
on (hospdt hospcd), using _merge to identify which
records don't match. However, I it gets messy trying to
get the results back into the original file.

Is there any command or routine out there that will let
me match up two variables (I can always group the date
& code variables into one) without disturbing the original
data? I don't mind sorting the admissions with patid
(the current order is the arbitrary order in which they
are reported).

Thanks for any tips,
Jeph
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index