[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Query regarding removing duplicates

From   Maarten buis <>
Subject   Re: st: Query regarding removing duplicates
Date   Thu, 8 May 2008 13:19:55 +0100 (BST)

--- Katrina Roper <> wrote:
> In my dataset on hospitalisation data there are people who have been 
> admitted to Hospital A then transferred (usually within days) to  
> Hospital B. This appears as two separations or instances of service  
> for the same person. However, it should only be counted once. The  
> usual way of dealing with this is to drop all the second instances
> and  use only the data for the primary admission (apparently).
> However, this isn't possible (at this stage) with my dataset.
> At discharge, people are discharged "home", "deceased" or
> "transferred  to another hospital'' etc.
> However, at admission, they are sometimes "via ward" but are mostly  
> "via accident and emergency". All the people who are discharge "to  
> another hospital" have been readmitted for continuation of care in  
> Hospital B through "accident and emergency".
> Thus, I cannot identify the records for those people being
> transferred in, which are the records I want to drop. Unless I can
> devise some tricky syntax to do this.
> Information that could be useful is the Hospital Record Number (HRN)
> - a unique person identifier; the date of admission to Hospital B -  
> should be the same day or next day to discharge from Hospital A; the 
> fact that their first record is "discharged to another hospital".
> Is there some way of writing code to achieve this:
> drop second record if (the person was discharged to another hospital 
> from the first admission) noting that (the HRN is the same for both  
> instances) & (date of admission at Hospital B will be same day or
> next day to that of admission to Hospital A).

In the example below I correct the admission variable, and create a
incident id variable. The trick is to use -by-, see -help by-, and to
know that _n - 1 refers to the prvious row.

*--------------------- begin example --------------------------
/* create some example data */
input hrn adm str8 adm_date disch str8 disch_date 
1 2 01022008 1 05012008
2 1 07012008 3 08012008
2 1 08012008 1 10012008
3 1 02012008 1 05012008
3 1 05012008 2 08012008

gen adm_date2 = date(adm_date, "DMY")
gen disch_date2 = date(disch_date, "DMY")
format adm_date2 disch_date2 %td
drop adm_date disch_date
rename adm_date2 adm_date
rename disch_date2 disch_date

label define adm 1 "accident" 2 "ward" 3 "transferred"
label value adm adm
label define disch 1 "home" 2 "deceased" 3 "transferred"
label value disch disch

/*Use example data for illustration*/


bys hrn (adm_date disch_date) : /// 
    replace adm = 3 if  ///
    adm_date == disch_date[_n-1] & disch[_n-1]==3

bys hrn (adm_date disch_date) : gen inc_id = _n
bys hrn (adm_date disch_date) : replace inc_id = inc_id[_n-1] if adm==3

*---------------------- end example -------------------------
(For more on how to use examples I sent to the Statalist, see )

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Sent from Yahoo! Mail.
A Smarter Email
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index