Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: matching cases by a transitive relation


From   Robert De Vries <robert.devries@sociology.ox.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: matching cases by a transitive relation
Date   Wed, 16 Jan 2013 14:08:49 +0000

Thanks Mike,  - That method seems to work perfectly!

Best,
Rob

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Lacy,Michael
Sent: 13 January 2013 21:48
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: matching cases by a transitive relation

On Fri, 11 Jan 2013 12:03:29 +0000,  Robert De Vries <robert.devries@sociology.ox.ac.uk> wrote:


>Dear Statalisters,
>
>I have a problem with attempting to match cases by a transitive 
>relation (A is related to B, B is related to C, so C must be related to A).
>
>Specifically, I am working with the longitudinal British Household 
>Panel Study (BHPS), and I am attempting to match siblings across time. 
>I can straightforwardly create a dataset which includes the ID number of all sibling pairs in the dataset in the following format:
>
>ID            |              SIBLING ID 
> A            |              B 
>B             |              A
>B             |              C
>
>However, this dataset does not reflect the additional relationship A-C. 
>This occurs when A and C are siblings but have never actually lived 
>together. For example, in Wave 1, A and B are siblings living together. 
>By Wave 2, A has moved out, and B has gained a new sibling; C (this 
>might be a step-sibling, for example, or a new birth). My dataset 
>reflects that fact that A and B are siblings, and that B and C are siblings, but because A and C have never been coded as siblings, my dataset does not reflect that they are.
>
>By their transitive relation through B, we know that A and C are 
>siblings. My question is: what code could I write to get the dataset to 
>reflect this? I need to somehow tell Stata that if A is related to B AND B is related to C, you need to create a new case which reflects that A is related to C.
>
>Hope you can help!
>

My idea would be to create a file of pairs (edges), where each observation is an individual and a person to whom s/he is linked either directly (AB,BC), or indirectly (AC).
In offering a solution,  I'm going to presume a simplification that your example above does not display, namely that 1) every person of interest shows up in the file as an "ID," (you don't have "C" as an ID) and 2) that permuted pairs are included (no AB and BA).  It should not be a problem to get that from what you have, but I omitted that step.

I'm conceptualizing problem in netaork terms, as an instance of a file of "ego/alter" edges, where you have a list of "direct" ego/alter observations, and you want a file that includes both the direct and the indirect links, where an "indirect" link to ego is someone directly linked to someone directly linked to ego.

My approach is not especially elegant, but it seems to scale up to large N without any problem.
Suggestions in regards to efficiency/elegance are welcome.

// Example data
input ego alter
1 4
1 5
2 3
2 4
3 5
4 5
4 6
5 4
5 6
6 3
end
bys ego: gen nalter = _N
//
// Make a wide file indexed by ego
sort ego
by ego: gen int n = _n
qui reshape wide alter, i(ego) j(n)
sort ego
preserve
rename ego matchalter
rename alter* link*
tempfile ego
save `ego'
restore
//
// For each of ego's direct alters, merge on all of that alter's alters // This will occasion redundancies, and self-matches, but we can clean that up later.
quiet ds alter*           
gen int matchalter = .
foreach s in `r(varlist)' {  // foreach possible alter
   replace matchalter = `s'
   sort matchalter
   qui merge m:1 matchalter using `ego', keep(3 1)
   drop _merge
   rename link* fromalter`s'_*
}
drop matchalter
//
// Go back to long format, and drop redundant records and self-matches.
// It's easier to treat all "direct" alters and indirect alters as just // "links"; the first nalter links are the original "direct" alters.  
local i = 1
foreach v of varlist alter* fromalter* {
   rename `v' link`i'
   local ++i
}
reshape long link, i(ego) j(linknum)
// Clean up
duplicates drop ego link, force
drop if (link == ego) | (link == .)
gen direct_alter = (linknum <= nalter) // we may want to know which alters were direct // nicer display sort ego link linknum by ego: replace linknum = _n order ego link nalter list //

Hope this helps.

Regards, 

Mike Lacy
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index