Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: extension of countmatch


From   Dalhia <[email protected]>
To   [email protected]
Subject   Re: st: extension of countmatch
Date   Tue, 20 Apr 2010 11:33:50 -0700 (PDT)

Robert and Nick,

oh WOW!! That is amazing! I just tried it, and it works great. All this time, I've been doing this process manually (I've done this reconciliation manually for three datasets so far).....

You've just significantly improved my quality of life.

Thank you so much. I hope I can pass on the favor to someone else (probably  not on this list though since I am not at all confident of my stata knowledge). 

best
dalhia
--- On Tue, 4/20/10, Robert Picard <[email protected]> wrote:

> From: Robert Picard <[email protected]>
> Subject: Re: st: extension of countmatch
> To: [email protected]
> Date: Tuesday, April 20, 2010, 8:10 PM
> Dalhia,
> 
> This looks like a problem I have had to handle before. I
> created a
> program to group identifiers when values match for
> specified
> variables. You can get it by typing, in Stata:
> 
> net from http://robertpicard.com/stata
> 
> I have prepared an example, starting from your example data
> and I have
> added a few extra lines that show other id combinations.
> The last id
> "aa9" is the same as "aa7" because each share a previous
> name ("bb4").
> 
> To get this to work, an initial newid variable is created
> to uniquely
> identify each observation. Each observation is duplicated
> and the
> variable an_id contains all name variations for each value
> of newid.
> 
> Then -group_id- does its magic and groups all your initial
> ids
> together. Of the 4 extra observations that I added, those
> that share
> "bb4" have been grouped together.
> 
> Hope this helps,
> 
> Robert
> 
> *--------------------------- begin example
> -----------------------
> 
> version 11
> 
> clear
> input str32(final_id id1 id2 id3)
> aaa aa1 aa2 aa3
> aa3 bb1 bb2
> bb1 ll1
> aa4 aa5 aa6
> aa7 bb3 bb4
> aa8 bb5 bb6 bb7
> aa9 a10 bb4
> end
> 
> list, noobs sep(0)
> 
> // create a new identifier
> gen newid = _n
> rename final_id an_id
> tempfile f
> qui save "`f'"
> 
> // create duplicate observations for each newid
> foreach v in id1 id2 id3 {
>     keep newid `v'
>     rename `v' an_id
>     keep if an_id != ""
>     append using "`f'"
>     qui save "`f'", replace
> }
> sort newid an_id
> list , sepby(newid)
> 
> // create a final merged_id, starting from newid
> gen merged_id = newid
> 
> // type net from http://robertpicard.com/stata to get
> -group_id-
> group_id merged_id, matchby(an_id)
> 
> sort merged_id newid an_id
> list, sepby(merged_id)
> 
> *--------------------- end example
> --------------------------
> 
> 
> On Tue, Apr 20, 2010 at 12:08 PM, Dalhia <[email protected]>
> wrote:
> > hi,
> > I need to do a particular data manipulation to
> reconcile multiple ids created over time. There are multiple
> rows (15,345 rows which require reconciliation) so I will be
> really grateful if this can be somehow automated.
> >
> > here is how the data looks
> >
> > final_id, id1, id2, id3
> > aaa, aa1, aa2, aa3
> > aa3, bb1, bb2
> > bb1, ll1
> >
> > In this example, all the ids are actually referring to
> the same entity since aa3 is actually also bb1 and bb2, and
> bb1 is also ll1. Here is how I am trying to get the data to
> look so I know that they all actually are the same entity:
> > final_id, id1, id2, id3, id4, id5, id6
> > aaa, aa1, aa2, aa3, bb1, bb2, ll1
> >
> > I was playing with somehow extending countmatch (which
> tells me when the same cell appears in other rows in other
> variables) so that it can identify these duplicates in other
> variables, and then also pull them out. But so far no luck.
> I am horrible at figuring out code. Any help will be
> appreciated.
> >
> > best
> > dalhia mani
> >
> >
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index