Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: matching cases by a transitive relation

From   Robert Picard <>
Subject   Re: st: matching cases by a transitive relation
Date   Sun, 13 Jan 2013 18:15:59 -0500

If I understand the problem correctly, I think that this can be solved
easily using -group_id- (available from SSC). Here's an example of how
I would proceed:

*------------------------------ sample code -------------------

input sibling1 sibling2
1 2
2 1
2 3
4 5
5 4
4 8
7 9
9 7
10 3

gen pairid = _n

* convert from wide to long the identifiers
expand 2
sort pairid
by pairid: gen id = sibling1 if _n == 1
by pairid: replace id = sibling2 if _n == 2

* group the initial relationship when the id match
gen sibling_group = pairid
group_id sibling_group, matchby(id)

* pick one record per id within a sibling_group
sort sibling_group id pairid
by sibling_group id: gen pick = _n == 1
list sibling_group id if pick, noobs sepby(sibling_group)

*------------------------------ end sample code ---------------

On Fri, Jan 11, 2013 at 7:03 AM, Robert De Vries
<> wrote:
> Dear Statalisters,
> I have a problem with attempting to match cases by a transitive relation (A is related to B, B is related to C, so C must be related to A).
> Specifically, I am working with the longitudinal British Household Panel Study (BHPS), and I am attempting to match siblings across time. I can straightforwardly create a dataset which includes the ID number of all sibling pairs in the dataset in the following format:
> ID            |              SIBLING ID
> A             |              B
> B             |              A
> B             |              C
> However, this dataset does not reflect the additional relationship A-C. This occurs when A and C are siblings but have never actually lived together. For example, in Wave 1, A and B are siblings living together. By Wave 2, A has moved out, and B has gained a new sibling; C (this might be a step-sibling, for example, or a new birth). My dataset reflects that fact that A and B are siblings, and that B and C are siblings, but because A and C have never been coded as siblings, my dataset does not reflect that they are.
> By their transitive relation through B, we know that A and C are siblings. My question is: what code could I write to get the dataset to reflect this? I need to somehow tell Stata that if A is related to B AND B is related to C, you need to create a new case which reflects that A is related to C.
> Hope you can help!
> Robert de Vries
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index