Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Tidying up a New and Old ID mapping dataset


From   Robert Picard <[email protected]>
To   [email protected]
Subject   Re: st: RE: Tidying up a New and Old ID mapping dataset
Date   Thu, 10 Mar 2011 13:52:44 +0100

This looks to me like a six degrees of separation type problem. If two
or more practice ID merge together, they should all be grouped under a
common overriding ID. Same if a practice ID is split. Grouping
identifiers in cases like this is pretty tricky but can be achieved
quite easily with a program I wrote called -group_id-. It is available
from SSC by typing:

ssc install group_id

In the example below, I show how to create an overriding practice ID
that takes into account all merge and split events. Note that if you
append data with practice ID (pid) before grouping identifiers, the
overriding identifier will correctly identify practices, even those
who have not been part of merge/split events.

Hope this helps, Robert

/* ------------------ begin examle ------------------*/
version 11

* Input merge and split events. Order does not matter
* as all practice ID involved in either merge or
* split events are considered related to each other.
* This means that they will all end-up part of a
* common overriding practice ID.
clear
input pid1 pid2 pid3 pid4 str7 eventdate
100033 10066 10077 .     "10/2003"
10066  10022 10088 .     "04/2008"
20044  20111 20055 20567 "05/2009"
20567  20113 20053 .     "07/2009"
end
gen event = _n
list, noobs

* Reshape events to long format
reshape long pid, i(event) j(npid)
drop if mi(pid)
list , noobs sepby(event)

* Generate a new overriding ID.
* You can append here other dataset(s) which include
* a practice ID per observations (make sure match the
* pid variable name). There can be more than one record
* per pid.
egen overid = group(pid)
group_id overid, matchby(event)

sort overid event pid
list, noobs sepby(overid)
/* ------------------ end examle --------------------*/

On Wed, Mar 9, 2011 at 9:23 PM, Ada Ma <[email protected]> wrote:
> Hi Nick,
>
> You are right about the trumping rule.   I have over 100 lines of
> these mapping rules, I need to sort out this list of rules, because I
> need to create a mapping list so that I can merge it to the data sets
> I'll be using for analyses.
>
> The data is GP practices.  There are around 1000 of them in Scotland.
> They merge / demerge / new GP joins / old GP leaves etc., every time
> such an action takes place a new practice ID is given to the practice.
>  To follow a practice through years throughout its transformation I
> have to bundle several practices together and treat it as a overriding
> practice.
>
> Here are two examples of those statements (not real practice numbers):
> 100033 SPLIT AND BECAME 10066 AND 10077 ON 10/2003 10066 MERGED WITH
> 10022 AND BECAME 10088 04/2008
>
> 10066 MERGED WITH 10022 AND BECAME 10088
>
> I have stripped out all the practice IDs but not sure how to make it
> clean, so that I get the mapping right.
>
> Ada
>
>
>
> On Wed, Mar 9, 2011 at 5:01 PM, Nick Cox <[email protected]> wrote:
>> I don't know whether I understand this. The issue appears to be that according to one rule C should be mapped to D and according to another rule D should be mapped to E and that trumps the first rule. And presumably there are other examples this kind. And the example is not to be taken literally, but is schematic.
>>
>> If that is so, all I can suggest is that the trumping rule is applied last, so that this sounds like -replace- followed by another. I don't know why a loop is thought necessary if there are most two steps.
>>
>> Nick
>> [email protected]
>>
>> Ada Ma
>>
>> I have this dataset which has two series of number IDs.  Say it looks like this:
>>
>> OriginalID    NewID
>> A                E
>> B                E
>> D                E
>> C                D
>>
>>
>> I need to map this information to existing data sets, so that all the
>> observations A, B, C, D, are mapped to become E.
>>
>> As you can see it's rather straightforward for the first three
>> observations, but for the fourth observation, C is mapped to D.  I
>> need to correct this information so that when the NewID is found
>> amongst the OriginalID, it is updated to contain the correct NewID.
>>
>> I need to write a few line of commands that would pick up the fourth
>> observation because it's NewID appears as the OriginalID in the third
>> observation, and replaces the fourth obs's NewID with the third obs's
>> NewID, so that the corrected dataset looks like this.
>>
>> OriginalID    NewID
>> A                E
>> B                E
>> D                E
>> C                E
>>
>>
>> I can write a loop to compare the NewID against every OriginalID in
>> the data, but then it will take a few rounds of the looping to get the
>> whole thing tidied up, are there any better method?
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Ada Ma
> Research Fellow
> Health Economics Research Unit
> University of Aberdeen, UK.
> http://www.abdn.ac.uk/heru/
> Tel: +44 (0) 1224 555189
> Fax: +44 (0) 1224 550926
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index