Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: matching cases by a transitive relation


From   "Lacy,Michael" <Michael.Lacy@colostate.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: matching cases by a transitive relation
Date   Sun, 13 Jan 2013 21:47:33 +0000

On Fri, 11 Jan 2013 12:03:29 +0000,  Robert De Vries <robert.devries@sociology.ox.ac.uk> wrote:


>Dear Statalisters,
>
>I have a problem with attempting to match cases by a transitive relation (A is related to B, B is
>related to C, so C must be related to A).
>
>Specifically, I am working with the longitudinal British Household Panel Study (BHPS), and I am
>attempting to match siblings across time. I can straightforwardly create a dataset which includes the
>ID number of all sibling pairs in the dataset in the following format:
>
>ID            |              SIBLING ID 
> A            |              B 
>B             |              A
>B             |              C
>
>However, this dataset does not reflect the additional relationship A-C. This occurs when A and C are
>siblings but have never actually lived together. For example, in Wave 1, A and B are siblings living
>together. By Wave 2, A has moved out, and B has gained a new sibling; C (this might be a step-sibling,
>for example, or a new birth). My dataset reflects that fact that A and B are siblings, and that B and
>C are siblings, but because A and C have never been coded as siblings, my dataset does not reflect
>that they are.
>
>By their transitive relation through B, we know that A and C are siblings. My question is: what code
>could I write to get the dataset to reflect this? I need to somehow tell Stata that if A is related to
>B AND B is related to C, you need to create a new case which reflects that A is related to C.
>
>Hope you can help!
>

My idea would be to create a file of pairs (edges), where each observation is an individual 
and a person to whom s/he is linked either directly (AB,BC), or indirectly (AC).
In offering a solution,  I'm going to presume a simplification that your example
above does not display, namely that 1) every person of interest shows up in the file as
an "ID," (you don't have "C" as an ID) and 2) that permuted pairs are included (no AB and
BA).  It should not be a problem to get that from what you have, but I omitted that
step.

I'm conceptualizing problem in netaork terms, as an instance of a file of "ego/alter" edges, 
where you have a list of "direct" ego/alter observations, and you want a file that includes 
both the direct and the indirect links, where an "indirect" link to ego is someone directly 
linked to someone directly linked to ego.

My approach is not especially elegant, but it seems to scale up to large N without any problem.
Suggestions in regards to efficiency/elegance are welcome.

// Example data
input ego alter
1 4 
1 5 
2 3 
2 4 
3 5 
4 5 
4 6 
5 4 
5 6 
6 3 
end
bys ego: gen nalter = _N
//
// Make a wide file indexed by ego
sort ego
by ego: gen int n = _n
qui reshape wide alter, i(ego) j(n)
sort ego
preserve
rename ego matchalter
rename alter* link*
tempfile ego
save `ego'
restore
//
// For each of ego's direct alters, merge on all of that alter's alters 
// This will occasion redundancies, and self-matches, but we can clean that up later.
quiet ds alter*           
gen int matchalter = .
foreach s in `r(varlist)' {  // foreach possible alter
   replace matchalter = `s'
   sort matchalter
   qui merge m:1 matchalter using `ego', keep(3 1)
   drop _merge
   rename link* fromalter`s'_*
}
drop matchalter
//
// Go back to long format, and drop redundant records and self-matches.
// It's easier to treat all "direct" alters and indirect alters as just 
// "links"; the first nalter links are the original "direct" alters.  
local i = 1
foreach v of varlist alter* fromalter* {
   rename `v' link`i'
   local ++i
}
reshape long link, i(ego) j(linknum)
// Clean up
duplicates drop ego link, force
drop if (link == ego) | (link == .)
gen direct_alter = (linknum <= nalter) // we may want to know which alters were direct
// nicer display
sort ego link linknum
by ego: replace linknum = _n
order ego link nalter 
list
//

Hope this helps.

Regards, 

Mike Lacy
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index