Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: identify family members using -egen (group)


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: identify family members using -egen (group)
Date   Fri, 11 Nov 2011 10:15:14 +0000

-egen, group()- cannot help you here without some prior work. The premise of -egen, group()- is that it groups observations with identical values on one or more variables. Your data do not satisfy that. 

You could apply -rowsort- first: 

. l

     +-------------------------------------+
     | id   fam~1_id   fam~2_id   fam~3_id |
     |-------------------------------------|
  1. |  1    missing    missing    missing |
  2. |  2          3    missing    missing |
  3. |  3          2    missing    missing |
  4. |  4          5          6    missing |
  5. |  5          4          6    missing |
     |-------------------------------------|
  6. |  6          4          5    missing |
     +-------------------------------------+

. rowsort id f*id , gen(s1-s4)

. l

     +------------------------------------------------------------------------+
     | id   fam~1_id   fam~2_id   fam~3_id   s1        s2        s3        s4 |
     |------------------------------------------------------------------------|
  1. |  1    missing    missing    missing    1   missing   missing   missing |
  2. |  2          3    missing    missing    2         3   missing   missing |
  3. |  3          2    missing    missing    2         3   missing   missing |
  4. |  4          5          6    missing    4         5         6   missing |
  5. |  5          4          6    missing    4         5         6   missing |
     |------------------------------------------------------------------------|
  6. |  6          4          5    missing    4         5         6   missing |
     +------------------------------------------------------------------------+

. egen group = group(s*)

. l

     +--------------------------------------------------------------------------------+
     | id   fam~1_id   fam~2_id   fam~3_id   s1        s2        s3        s4   group |
     |--------------------------------------------------------------------------------|
  1. |  1    missing    missing    missing    1   missing   missing   missing       1 |
  2. |  2          3    missing    missing    2         3   missing   missing       2 |
  3. |  3          2    missing    missing    2         3   missing   missing       2 |
  4. |  4          5          6    missing    4         5         6   missing       3 |
  5. |  5          4          6    missing    4         5         6   missing       3 |
     |--------------------------------------------------------------------------------|
  6. |  6          4          5    missing    4         5         6   missing       3 |
     +--------------------------------------------------------------------------------+


You must install -rowsort- first. -rowsort- is described in 

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
        Q1/09   SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

and may be downloaded from the Stata Journal files regardless of whether you subscribe to the Stata Journal. The article is apparently a good one anyway. 

Note that I copied your example as if you had strings, but -rowsort- works with numeric variables too. 

Nick 
n.j.cox@durham.ac.uk 

Amanda Fu

I know there are discussions on how to identify siblings before on
statalist. The solutions are using the same mother and fathor's ID.
But I still have not figured out  how to identify family members in my
data set, since there are no parents' ID.

The data set looks like as follows:
---------------------------------------
id       fam_member1_id    fam_member2_id   fam_member3_id

1            missing                   missing missing
2            3                             missing                     missing
3            2                             missing                     missing
4            5                              6     missing
5            4                              6     missing
6            4                              5     missing
............
----------------------------------
That is, ID 2 and 3; 4,5, and 6 are in the same families.
I tried to use
. egen famid =group(id  fam_member1_id    fam_member2_id
fam_member3_id),missing
but the famid I got is not the same for a family.
----------------------------------the last column is what I want to get-----
id       fam_member1_id    fam_member2_id   fam_member3_id     famid

1             missing                  missing
missing            missing
2            3                             missing
missing              1
3            2                             missing
missing              1
4            5                              6
    missing              2
5            4                              6
    missing              2
6            4                              5
    missing              2
............
----------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index