Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Data Question

From   David Kantor <>
Subject   Re: st: Data Question
Date   Wed, 29 Sep 2004 18:01:22 -0400

At 12:58 PM 9/29/2004 -0700, Jules Elkins wrote:

I have a dataset where I am trying to match parents with their
children. The data is structured as follows:

Person                                  Who is your parent?     Family ID
1 Head of Household                             -               1
2 Spouse of household head                      -               1
3 Male adult child of head/spouse               1, 2            2
4 Female adult child of head/spouse             1, 2            3
5 Underage child of head/spouse                 1, 2            1
6 Daughter-in-law of head/spouse                -               2
7 Grandchild of head/spouse                     3, 5            2
8 Grandchild of head/spouse                     -, 4            3

I want to generate unique family IDs, where family is defined as
parent-child, not grandparents, etc.
        Family 1: 1, 2, 5
        Family 2: 3, 6, 7
        Family 3: 4, 8
But since I am concerned about children, I identify the families by
children who are younger than 21, hence the family ID of family 1 does not
include their adult children.

The only way it lists the parent-child relationship is by the line number
of the person. I am able to correctly match head and spouse with their
children, but in the above structure, how do I generate a unique ID for
family? The only way I seem to be able to do it confuses families 2 & 3.
What is this Family ID in the table above? Is it part of the data, or is it the value you are trying to compute (with the values shown as examples of what your goal would be)?

I guess it is the latter.

I think this is a difficult problem. It would be better if you had more information such as a household id and a who_is_spouse id for those who are spouse-of-head (or anyone who is married).

Anyway, I would start by assigning a unique value to each household head. From there, you need to collect the people who are related to that person -- by some suitably close relation. That would be underage_child. But then you also want the spouse. There is no direct link (as stated) for spouse. You could try to connect them through common underage children (or any children). But in general, this can lead to ambiguity, as some people have children with multiple partners and these can create overlapping "families". If you had a spouse identifier, it would help. If you limit such relations to within-household (requiring a household identifier), that, too, would help (and probably eliminate overlapping families -- probably, not certainly).

(Perhaps you know that people can be partitioned into households by using parental relationships and common children. Actually it is true in general that you can partition the people into some set of non-overlapping subsets using those relationships. The question is whether the classes so formed have exactly one head in each.)

The second step is to look at the people left out of the first step. Where to start (finding "heads" of these subfamilies) is not clear, but after that hurdle you still have the same problem of potential ambiguity in linking spouses. (The in-law relations may help here.)

I haven't solved the problem -- just told you that it is indeed difficult. More information would help (more relationship data and more understanding of how households and the relationships are defined).

Depending on the data, it may be indeed impossible to assign family id unambiguously. But if it is possible, then, again, depending on the data, it may be a more complex problem than it seemed at first. You might say that the problem is that we are all related.

One more thought: this looks like a graph-tracing problem, and I have found that Stata is not particularly suitable for these kinds of problems. This is not to say it isn't possible.

Sorry to give you a pessimistic response. Maybe some of the other listers have some other ideas.
Good luck.
-- David

David Kantor
Institute for Policy Studies
Johns Hopkins University

* For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index