Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Finding and tagging overlapping groups


From   "Michael Blasnik" <[email protected]>
To   <[email protected]>
Subject   st: Re: Finding and tagging overlapping groups
Date   Fri, 29 Jul 2005 19:39:31 -0400

I have had need for similar twisted data management puzzles. The program below might do the trick. It basically selects a value for one var, assigns it a group #, copies that group for that value of x, then copies the group # to all values of y that have that x, then copies them back to x for all values matching those y values and back once more through. It then flags those cases as done so they move to the back of the sort order for the next round through the loop. The group number get incremented and it keeps looping until every observation has been assigned a group #. I think it should work, but you should definitely test it.

program define formgrp
version 9.0
syntax varlist (min=2 max=2), gen(str)
tempvar done
gen byte `done'=0
tokenize `varlist'
local v1 "`1'"
local v2 "`2'"
gen `gen'=.
local group=1
qui count if `done'==0
while r(N)>0 {
bysort `done' (`v1' `v2'): replace `gen'=`group' if `v1'==`v1'[1]
bysort `done' `v2' (`gen'): replace `gen'=`gen'[1]
bysort `done' `v1' (`gen'): replace `gen'=`gen'[1]
bysort `done' `v2' (`gen'): replace `gen'=`gen'[1]
bysort `done' `v1' (`gen'): replace `gen'=`gen'[1]
qui replace `done'=1 if `done'==0 & `gen'<.
local group=`group'+1
qui count if `done'==0
}
end


you could then use it for your example case like this:

formgrp Z E, gen(group)

Michael Blasnik
[email protected]


----- Original Message ----- From: "Fredrik Wallenberg" <[email protected]>
To: <[email protected]>
Sent: Friday, July 29, 2005 7:09 PM
Subject: st: Finding and tagging overlapping groups



This is simply a reformulation of a question I sent out yesterday (and
didn't get any responses to :) I have data sets that, when merged
produce a table with many-to-many relationships. The table below
contains the ID's from each table (Z and E)

    +----------+
    | Z     E  |
    |----------|
 1. |   a    x |
 2. |   b    x |
 3. |   b    z |
 4. |   c    y |
 5. |   d    z |
    |----------|
 6. |   e    q |
 7. |   e    z |
    +----------+

In as a base for further calculations I've created variables showing
duplicates and overlap between groups:
    +----------------------------------+
    | Z     E    zdup   edup   overlap |
    |----------------------------------|
 1. |   a    x      0      1         0 |
 2. |   b    x      1      1         1 |
 3. |   b    z      1      2         1 |
 4. |   c    y      0      0         0 |
 5. |   d    z      0      2         0 |
    |----------------------------------|
 6. |   e    q      1      0         0 |
 7. |   e    z      1      2         1 |
    +----------------------------------+



What I need to do is to create a group variable for all records that
are linked to each other through overlapping Z/E. In the example above
I would like to end up with something like:

    +------------------+
    | zip   ex   group |
    |------------------|
 1. |   a    x       1 |
 2. |   b    x       1 |
 3. |   b    z       1 |
 4. |   c    y       2 |
 5. |   d    z       1 |
    |------------------|
 6. |   e    q       1 |
 7. |   e    z       1 |
    +------------------+


I've spent several days now trying to figure out how to do that in
Stata/Filemaker/Excel and haven't solved it yet. Any help would be
most welcome!!!!

Fredrik
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index