Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: how to generate parent variables matched to their children in household level data set?


From   Haena Lee <hannahlee419@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: how to generate parent variables matched to their children in household level data set?
Date   Sat, 23 Feb 2013 02:36:50 -0600

Nick,

I would love to merge father's and mother's data with children. That
was my first choice.
As you may have noticed, however, my data doesn't have one clear
indicator variable of who is mother/father/child/grandparent. Although
there are ID_F and ID_M,  what makes me confused is, ID_F and ID_M are
on the same row of  children. I see "fid and mid" from your previous
answer is also located on children's row. So how do I tell stata to
generate a new indicator of "mothers" and to treat it as a property of
mothers, not children? So that eventually I would extract moms from
this raw data (e.g., keep ID BMI_M EMP_M if mom==1) and merge (1:many)
it based on key variable (ID_fam) with children's data?

Assuming looping would do this work,

gen mom=.
unab Y: ID
unab Z: ID_M
forevar x of newlist mom
        replace `x' ==1 if Y==Z
 }

Please note that I am not familiar with the concept of looping. Just
taught myself today for a little bit so I am not sure if those
commands above would make sense. If not, let me know. I'd happy to
explain it again.

Haena

On Fri, Feb 22, 2013 at 7:54 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> Note that I wrote that FAQ some years ago. Now I think why didn't I
> approach that as a -merge- problem?  Create a dataset with fathers'
> data, one with mothers' data, and -merge- using those. There is still
> some fiddling around. This all goes with the simple idea that we have
> favourite tools.
>
> Nick
>
> On Sat, Feb 23, 2013 at 1:50 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> That's an allusion is to my FAQ
>>
>> FAQ     . . Creating variables recording prop. of the other members of a group
>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>         4/05    How do I create variables summarizing for each
>>                 individual properties of the other members of a
>>                 group?
>>
>> http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/
>>
>> I don't know why you report problems. The code suggested there works
>> as intended. Here it is again run on your example data:
>>
>> . by ID_fam (ID), sort: gen pid = _n
>>
>> . gen byte fid = .
>> (7 missing values generated)
>>
>> . gen byte mid = .
>> (7 missing values generated)
>>
>> . summarize pid, meanonly
>>
>> . forval i = 1 / `r(max)' {
>>   2.                 by ID_fam: replace fid = `i' if ID_F == ID[`i'] &
>> !missing(ID_F)
>>   3.                 by ID_fam: replace mid = `i' if ID_M == ID[`i'] &
>> !missing(ID_M)
>>   4. }
>> (3 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (3 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>>
>> . l
>>
>>      +----------------------------------------------------------------------------------+
>>      |       ID_F         ID_M      BMI           ID     ID_fam   Emp
>>  pid   fid   mid |
>>      |----------------------------------------------------------------------------------|
>>   1. |                           26.501   A901963701   A9019637     1
>>    1     .     . |
>>   2. |                           20.483   A901963702   A9019637     1
>>    2     .     . |
>>   3. | A901963701   A901963702   20.924   A901963703   A9019637     .
>>    3     1     2 |
>>   4. |                           27.209   A901963801   A9019638     1
>>    1     .     . |
>>   5. |                           31.733   A901963802   A9019638     .
>>    2     .     . |
>>      |----------------------------------------------------------------------------------|
>>   6. | A901963801   A901963802   18.018   A901963803   A9019638     .
>>    3     1     2 |
>>   7. | A901963801   A901963802   19.054   A901963804   A9019638     .
>>    4     1     2 |
>>      +----------------------------------------------------------------------------------+
>>
>> Using the same logic, we copy parents' employment and mothers' BMI as desired:
>>
>> . gen BMI_M = .
>> (7 missing values generated)
>>
>> . gen Emp_M = .
>> (7 missing values generated)
>>
>> . gen Emp_F = .
>> (7 missing values generated)
>>
>> . summarize pid, meanonly
>>
>> . forval i = 1 / `r(max)' {
>>   2.     by ID_fam: replace BMI_M = BMI[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>>   3.     by ID_fam: replace Emp_M = Emp[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>>   4.     by ID_fam: replace Emp_F = Emp[`i'] if ID_F == ID[`i'] & !missing(ID_F)
>>   5. }
>> (0 real changes made)
>> (0 real changes made)
>> (3 real changes made)
>> (3 real changes made)
>> (1 real change made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>>
>>
>> Here are the results:
>>
>> . l
>>
>>      +-----------------------------------------------------------------------------------------------+
>>      |       ID_F         ID_M      BMI           ID     ID_fam   Emp
>>  pid    BMI_M   Emp_M   Emp_F |
>>      |-----------------------------------------------------------------------------------------------|
>>   1. |                           26.501   A901963701   A9019637     1
>>    1        .       .       . |
>>   2. |                           20.483   A901963702   A9019637     1
>>    2        .       .       . |
>>   3. | A901963701   A901963702   20.924   A901963703   A9019637     .
>>    3   20.483       1       1 |
>>   4. |                           27.209   A901963801   A9019638     1
>>    1        .       .       . |
>>   5. |                           31.733   A901963802   A9019638     .
>>    2        .       .       . |
>>      |-----------------------------------------------------------------------------------------------|
>>   6. | A901963801   A901963802   18.018   A901963803   A9019638     .
>>    3   31.733       .       1 |
>>   7. | A901963801   A901963802   19.054   A901963804   A9019638     .
>>    4   31.733       .       1 |
>>      +-----------------------------------------------------------------------------------------------+
>>
>> Nick
>>
>> On Fri, Feb 22, 2013 at 10:45 PM, Haena Lee <hannahlee419@gmail.com> wrote:
>>
>>> I am working on investigating the relationship between maternal
>>> employment status and prevalence of childhood obesity using a
>>> nationally representative data (KNHANES). Suppose I have ID(all
>>> observations including both children and parents), ID_fam (household
>>> indicator),
>>> ID_F( father's ID), ID_M (mother's ID), BMI (body mass index) and
>>> finally Emp (employment status 1 if employed; 0 if non-employed) as
>>> the following;
>>>
>>> ID_F              ID_M           BMI                    ID                ID_fam       Emp
>>>                                                  26.501         A901963701       A9019637   1
>>>                                                  20.483         A901963702       A9019637   1
>>> A901963701      A901963702       20.924         A901963703       A9019637    .
>>>                                                  27.209         A901963801       A9019638   1
>>>                                                  31.733         A901963802       A9019638    .
>>> A901963801      A901963802      18.018            A901963803     A9019638    .
>>> A901963801      A901963802      19.054          A901963804       A9019638    .
>>>
>>> And ultimately, I would like to have a data set like this following;
>>>
>>> ID (children)   ID_fam         BMI        Mom's Bmi Mom's Emp   Dad's Emp
>>> A901963703  A9019637   20.924   20.483         1                    1
>>> A901963803  A9019638   18.018   31.733          .                     1
>>> A901963804  A9019638   19.054   31.733          .                     1
>>>
>>> Given this, my question is 1) how to map the properties of other
>>> family members to children within each household, using loop, or 2)
>>> how to generate an indicator of mother (1 if ID == ID_M; 0 otherwise)?
>>> I found Nick Cox's helpful example and imitated it as the following;
>>>
>>> by ID_fam (ID), sort: gen pid = _n
>>> gen byte fid = .
>>> gen byte mid = .
>>> summarize pid, meanonly
>>> forval i = 1 / `r(max)' {
>>>                 by ID_fam: replace fid = `i'
>>>                 if ID_F == ID[`i'] & !missing(ID_F)
>>>                 by ID_fam: replace mid = `i'
>>>                 if ID_M == ID[`i'] & !missing(ID_M)
>>> }
>>>
>>> And it didn't produce any meaningful values but missing. Please
>>> advise. Thank you so much for any help in advance.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



--
--------------------------------------
Haena Lee
Ph.D Student
Sociology Department
The University of Chicago
312 - 405 - 3223


-- 
=====================
Haena Lee
Ph.D Student
Sociology Department
The University of Chicago
312 - 405 - 3223
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index