Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to generate parent variables matched to their children in household level data set?


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how to generate parent variables matched to their children in household level data set?
Date   Sat, 23 Feb 2013 01:54:16 +0000

Note that I wrote that FAQ some years ago. Now I think why didn't I
approach that as a -merge- problem?  Create a dataset with fathers'
data, one with mothers' data, and -merge- using those. There is still
some fiddling around. This all goes with the simple idea that we have
favourite tools.

Nick

On Sat, Feb 23, 2013 at 1:50 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> That's an allusion is to my FAQ
>
> FAQ     . . Creating variables recording prop. of the other members of a group
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         4/05    How do I create variables summarizing for each
>                 individual properties of the other members of a
>                 group?
>
> http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/
>
> I don't know why you report problems. The code suggested there works
> as intended. Here it is again run on your example data:
>
> . by ID_fam (ID), sort: gen pid = _n
>
> . gen byte fid = .
> (7 missing values generated)
>
> . gen byte mid = .
> (7 missing values generated)
>
> . summarize pid, meanonly
>
> . forval i = 1 / `r(max)' {
>   2.                 by ID_fam: replace fid = `i' if ID_F == ID[`i'] &
> !missing(ID_F)
>   3.                 by ID_fam: replace mid = `i' if ID_M == ID[`i'] &
> !missing(ID_M)
>   4. }
> (3 real changes made)
> (0 real changes made)
> (0 real changes made)
> (3 real changes made)
> (0 real changes made)
> (0 real changes made)
> (0 real changes made)
> (0 real changes made)
>
> . l
>
>      +----------------------------------------------------------------------------------+
>      |       ID_F         ID_M      BMI           ID     ID_fam   Emp
>  pid   fid   mid |
>      |----------------------------------------------------------------------------------|
>   1. |                           26.501   A901963701   A9019637     1
>    1     .     . |
>   2. |                           20.483   A901963702   A9019637     1
>    2     .     . |
>   3. | A901963701   A901963702   20.924   A901963703   A9019637     .
>    3     1     2 |
>   4. |                           27.209   A901963801   A9019638     1
>    1     .     . |
>   5. |                           31.733   A901963802   A9019638     .
>    2     .     . |
>      |----------------------------------------------------------------------------------|
>   6. | A901963801   A901963802   18.018   A901963803   A9019638     .
>    3     1     2 |
>   7. | A901963801   A901963802   19.054   A901963804   A9019638     .
>    4     1     2 |
>      +----------------------------------------------------------------------------------+
>
> Using the same logic, we copy parents' employment and mothers' BMI as desired:
>
> . gen BMI_M = .
> (7 missing values generated)
>
> . gen Emp_M = .
> (7 missing values generated)
>
> . gen Emp_F = .
> (7 missing values generated)
>
> . summarize pid, meanonly
>
> . forval i = 1 / `r(max)' {
>   2.     by ID_fam: replace BMI_M = BMI[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>   3.     by ID_fam: replace Emp_M = Emp[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>   4.     by ID_fam: replace Emp_F = Emp[`i'] if ID_F == ID[`i'] & !missing(ID_F)
>   5. }
> (0 real changes made)
> (0 real changes made)
> (3 real changes made)
> (3 real changes made)
> (1 real change made)
> (0 real changes made)
> (0 real changes made)
> (0 real changes made)
> (0 real changes made)
> (0 real changes made)
> (0 real changes made)
> (0 real changes made)
>
>
> Here are the results:
>
> . l
>
>      +-----------------------------------------------------------------------------------------------+
>      |       ID_F         ID_M      BMI           ID     ID_fam   Emp
>  pid    BMI_M   Emp_M   Emp_F |
>      |-----------------------------------------------------------------------------------------------|
>   1. |                           26.501   A901963701   A9019637     1
>    1        .       .       . |
>   2. |                           20.483   A901963702   A9019637     1
>    2        .       .       . |
>   3. | A901963701   A901963702   20.924   A901963703   A9019637     .
>    3   20.483       1       1 |
>   4. |                           27.209   A901963801   A9019638     1
>    1        .       .       . |
>   5. |                           31.733   A901963802   A9019638     .
>    2        .       .       . |
>      |-----------------------------------------------------------------------------------------------|
>   6. | A901963801   A901963802   18.018   A901963803   A9019638     .
>    3   31.733       .       1 |
>   7. | A901963801   A901963802   19.054   A901963804   A9019638     .
>    4   31.733       .       1 |
>      +-----------------------------------------------------------------------------------------------+
>
> Nick
>
> On Fri, Feb 22, 2013 at 10:45 PM, Haena Lee <hannahlee419@gmail.com> wrote:
>
>> I am working on investigating the relationship between maternal
>> employment status and prevalence of childhood obesity using a
>> nationally representative data (KNHANES). Suppose I have ID(all
>> observations including both children and parents), ID_fam (household
>> indicator),
>> ID_F( father's ID), ID_M (mother's ID), BMI (body mass index) and
>> finally Emp (employment status 1 if employed; 0 if non-employed) as
>> the following;
>>
>> ID_F              ID_M           BMI                    ID                ID_fam       Emp
>>                                                  26.501         A901963701       A9019637   1
>>                                                  20.483         A901963702       A9019637   1
>> A901963701      A901963702       20.924         A901963703       A9019637    .
>>                                                  27.209         A901963801       A9019638   1
>>                                                  31.733         A901963802       A9019638    .
>> A901963801      A901963802      18.018            A901963803     A9019638    .
>> A901963801      A901963802      19.054          A901963804       A9019638    .
>>
>> And ultimately, I would like to have a data set like this following;
>>
>> ID (children)   ID_fam         BMI        Mom's Bmi Mom's Emp   Dad's Emp
>> A901963703  A9019637   20.924   20.483         1                    1
>> A901963803  A9019638   18.018   31.733          .                     1
>> A901963804  A9019638   19.054   31.733          .                     1
>>
>> Given this, my question is 1) how to map the properties of other
>> family members to children within each household, using loop, or 2)
>> how to generate an indicator of mother (1 if ID == ID_M; 0 otherwise)?
>> I found Nick Cox's helpful example and imitated it as the following;
>>
>> by ID_fam (ID), sort: gen pid = _n
>> gen byte fid = .
>> gen byte mid = .
>> summarize pid, meanonly
>> forval i = 1 / `r(max)' {
>>                 by ID_fam: replace fid = `i'
>>                 if ID_F == ID[`i'] & !missing(ID_F)
>>                 by ID_fam: replace mid = `i'
>>                 if ID_M == ID[`i'] & !missing(ID_M)
>> }
>>
>> And it didn't produce any meaningful values but missing. Please
>> advise. Thank you so much for any help in advance.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index