Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to generate parent variables matched to their children in household level data set?


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how to generate parent variables matched to their children in household level data set?
Date   Sat, 23 Feb 2013 01:50:08 +0000

That's an allusion is to my FAQ

FAQ     . . Creating variables recording prop. of the other members of a group
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        4/05    How do I create variables summarizing for each
                individual properties of the other members of a
                group?

http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/

I don't know why you report problems. The code suggested there works
as intended. Here it is again run on your example data:

. by ID_fam (ID), sort: gen pid = _n

. gen byte fid = .
(7 missing values generated)

. gen byte mid = .
(7 missing values generated)

. summarize pid, meanonly

. forval i = 1 / `r(max)' {
  2.                 by ID_fam: replace fid = `i' if ID_F == ID[`i'] &
!missing(ID_F)
  3.                 by ID_fam: replace mid = `i' if ID_M == ID[`i'] &
!missing(ID_M)
  4. }
(3 real changes made)
(0 real changes made)
(0 real changes made)
(3 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)

. l

     +----------------------------------------------------------------------------------+
     |       ID_F         ID_M      BMI           ID     ID_fam   Emp
 pid   fid   mid |
     |----------------------------------------------------------------------------------|
  1. |                           26.501   A901963701   A9019637     1
   1     .     . |
  2. |                           20.483   A901963702   A9019637     1
   2     .     . |
  3. | A901963701   A901963702   20.924   A901963703   A9019637     .
   3     1     2 |
  4. |                           27.209   A901963801   A9019638     1
   1     .     . |
  5. |                           31.733   A901963802   A9019638     .
   2     .     . |
     |----------------------------------------------------------------------------------|
  6. | A901963801   A901963802   18.018   A901963803   A9019638     .
   3     1     2 |
  7. | A901963801   A901963802   19.054   A901963804   A9019638     .
   4     1     2 |
     +----------------------------------------------------------------------------------+

Using the same logic, we copy parents' employment and mothers' BMI as desired:

. gen BMI_M = .
(7 missing values generated)

. gen Emp_M = .
(7 missing values generated)

. gen Emp_F = .
(7 missing values generated)

. summarize pid, meanonly

. forval i = 1 / `r(max)' {
  2.     by ID_fam: replace BMI_M = BMI[`i'] if ID_M == ID[`i'] & !missing(ID_M)
  3.     by ID_fam: replace Emp_M = Emp[`i'] if ID_M == ID[`i'] & !missing(ID_M)
  4.     by ID_fam: replace Emp_F = Emp[`i'] if ID_F == ID[`i'] & !missing(ID_F)
  5. }
(0 real changes made)
(0 real changes made)
(3 real changes made)
(3 real changes made)
(1 real change made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)


Here are the results:

. l

     +-----------------------------------------------------------------------------------------------+
     |       ID_F         ID_M      BMI           ID     ID_fam   Emp
 pid    BMI_M   Emp_M   Emp_F |
     |-----------------------------------------------------------------------------------------------|
  1. |                           26.501   A901963701   A9019637     1
   1        .       .       . |
  2. |                           20.483   A901963702   A9019637     1
   2        .       .       . |
  3. | A901963701   A901963702   20.924   A901963703   A9019637     .
   3   20.483       1       1 |
  4. |                           27.209   A901963801   A9019638     1
   1        .       .       . |
  5. |                           31.733   A901963802   A9019638     .
   2        .       .       . |
     |-----------------------------------------------------------------------------------------------|
  6. | A901963801   A901963802   18.018   A901963803   A9019638     .
   3   31.733       .       1 |
  7. | A901963801   A901963802   19.054   A901963804   A9019638     .
   4   31.733       .       1 |
     +-----------------------------------------------------------------------------------------------+

Nick

On Fri, Feb 22, 2013 at 10:45 PM, Haena Lee <hannahlee419@gmail.com> wrote:

> I am working on investigating the relationship between maternal
> employment status and prevalence of childhood obesity using a
> nationally representative data (KNHANES). Suppose I have ID(all
> observations including both children and parents), ID_fam (household
> indicator),
> ID_F( father's ID), ID_M (mother's ID), BMI (body mass index) and
> finally Emp (employment status 1 if employed; 0 if non-employed) as
> the following;
>
> ID_F              ID_M           BMI                    ID                ID_fam       Emp
>                                                  26.501         A901963701       A9019637   1
>                                                  20.483         A901963702       A9019637   1
> A901963701      A901963702       20.924         A901963703       A9019637    .
>                                                  27.209         A901963801       A9019638   1
>                                                  31.733         A901963802       A9019638    .
> A901963801      A901963802      18.018            A901963803     A9019638    .
> A901963801      A901963802      19.054          A901963804       A9019638    .
>
> And ultimately, I would like to have a data set like this following;
>
> ID (children)   ID_fam         BMI        Mom's Bmi Mom's Emp   Dad's Emp
> A901963703  A9019637   20.924   20.483         1                    1
> A901963803  A9019638   18.018   31.733          .                     1
> A901963804  A9019638   19.054   31.733          .                     1
>
> Given this, my question is 1) how to map the properties of other
> family members to children within each household, using loop, or 2)
> how to generate an indicator of mother (1 if ID == ID_M; 0 otherwise)?
> I found Nick Cox's helpful example and imitated it as the following;
>
> by ID_fam (ID), sort: gen pid = _n
> gen byte fid = .
> gen byte mid = .
> summarize pid, meanonly
> forval i = 1 / `r(max)' {
>                 by ID_fam: replace fid = `i'
>                 if ID_F == ID[`i'] & !missing(ID_F)
>                 by ID_fam: replace mid = `i'
>                 if ID_M == ID[`i'] & !missing(ID_M)
> }
>
> And it didn't produce any meaningful values but missing. Please
> advise. Thank you so much for any help in advance.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index