Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to generate parent variables matched to their children in household level data set?


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: how to generate parent variables matched to their children in household level data set?
Date   Sun, 24 Feb 2013 15:01:07 +0000

As a small correction here that does not affect the major points:

Haena said "indicator variable" and I echoed her, but we both meant
"identifier variable".

Nick

On Sat, Feb 23, 2013 at 9:33 AM, Nick Cox <[email protected]> wrote:
> Haena:
>
> I am at a loss to understand what you are asking. My previous posts
> showed that with your sample data the code I used does work. It
> remains a mystery why you first reported otherwise, and also why you
> imply that the problem you stated is still unsolved. I just did that
> for you. It seems that you have not studied my code and its results.
>
> The absence of a single clear indicator variable is immaterial here.
> You want to copy data from mothers' and fathers' observations to
> children's; for that being able to link mother and father identifiers
> to children is necessary and sufficient, and done separately.
>
> My mention of -merge- just hints at a different method, but I have
> given a method that works. I was not stating or implying that you need
> to -merge-; that's merely a good alternative.
>
> If you want to know why my method works you need to study not only
> discussion of loops as in
>
> SJ-2-2  pr0005  . . . . . .  Speaking Stata:  How to face lists with fortitude
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         Q2/02   SJ 2(2):202--222                                 (no commands)
>         demonstrates the usefulness of for, foreach, forvalues, and
>         local macros for interactive (non programming) tasks
>
> but also the use of -by:- as in
>
> SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         Q1/02   SJ 2(1):86--102                                  (no commands)
>         explains the use of the by varlist : construct to tackle
>         a variety of problems with group structure, ranging from
>         simple calculations for each of several groups to more
>
> My code requires the fact that under the aegis of -by:-  subscripts
> (42 in -foo[42]- is a subscript) are numbered within groups, so the
> subscript [1] refers to the first observation in each group.
>
> As said, I don't see that you need any further code, so I have not
> studied your code beyond noticing that -forevar- is not a Stata
> command.
>
> Nick
>
> On Sat, Feb 23, 2013 at 8:36 AM, Haena Lee <[email protected]> wrote:
>> Nick,
>>
>> I would love to merge father's and mother's data with children. That
>> was my first choice.
>> As you may have noticed, however, my data doesn't have one clear
>> indicator variable of who is mother/father/child/grandparent. Although
>> there are ID_F and ID_M,  what makes me confused is, ID_F and ID_M are
>> on the same row of  children. I see "fid and mid" from your previous
>> answer is also located on children's row. So how do I tell stata to
>> generate a new indicator of "mothers" and to treat it as a property of
>> mothers, not children? So that eventually I would extract moms from
>> this raw data (e.g., keep ID BMI_M EMP_M if mom==1) and merge (1:many)
>> it based on key variable (ID_fam) with children's data?
>>
>> Assuming looping would do this work,
>>
>> gen mom=.
>> unab Y: ID
>> unab Z: ID_M
>> forevar x of newlist mom
>>         replace `x' ==1 if Y==Z
>>  }
>>
>> Please note that I am not familiar with the concept of looping. Just
>> taught myself today for a little bit so I am not sure if those
>> commands above would make sense. If not, let me know. I'd happy to
>> explain it again.
>>
>> Haena
>>
>> On Fri, Feb 22, 2013 at 7:54 PM, Nick Cox <[email protected]> wrote:
>>> Note that I wrote that FAQ some years ago. Now I think why didn't I
>>> approach that as a -merge- problem?  Create a dataset with fathers'
>>> data, one with mothers' data, and -merge- using those. There is still
>>> some fiddling around. This all goes with the simple idea that we have
>>> favourite tools.
>>>
>>> Nick
>>>
>>> On Sat, Feb 23, 2013 at 1:50 AM, Nick Cox <[email protected]> wrote:
>>>> That's an allusion is to my FAQ
>>>>
>>>> FAQ     . . Creating variables recording prop. of the other members of a group
>>>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>>>         4/05    How do I create variables summarizing for each
>>>>                 individual properties of the other members of a
>>>>                 group?
>>>>
>>>> http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/
>>>>
>>>> I don't know why you report problems. The code suggested there works
>>>> as intended. Here it is again run on your example data:
>>>>
>>>> . by ID_fam (ID), sort: gen pid = _n
>>>>
>>>> . gen byte fid = .
>>>> (7 missing values generated)
>>>>
>>>> . gen byte mid = .
>>>> (7 missing values generated)
>>>>
>>>> . summarize pid, meanonly
>>>>
>>>> . forval i = 1 / `r(max)' {
>>>>   2.                 by ID_fam: replace fid = `i' if ID_F == ID[`i'] &
>>>> !missing(ID_F)
>>>>   3.                 by ID_fam: replace mid = `i' if ID_M == ID[`i'] &
>>>> !missing(ID_M)
>>>>   4. }
>>>> (3 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (3 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>>
>>>> . l
>>>>
>>>>      +----------------------------------------------------------------------------------+
>>>>      |       ID_F         ID_M      BMI           ID     ID_fam   Emp
>>>>  pid   fid   mid |
>>>>      |----------------------------------------------------------------------------------|
>>>>   1. |                           26.501   A901963701   A9019637     1
>>>>    1     .     . |
>>>>   2. |                           20.483   A901963702   A9019637     1
>>>>    2     .     . |
>>>>   3. | A901963701   A901963702   20.924   A901963703   A9019637     .
>>>>    3     1     2 |
>>>>   4. |                           27.209   A901963801   A9019638     1
>>>>    1     .     . |
>>>>   5. |                           31.733   A901963802   A9019638     .
>>>>    2     .     . |
>>>>      |----------------------------------------------------------------------------------|
>>>>   6. | A901963801   A901963802   18.018   A901963803   A9019638     .
>>>>    3     1     2 |
>>>>   7. | A901963801   A901963802   19.054   A901963804   A9019638     .
>>>>    4     1     2 |
>>>>      +----------------------------------------------------------------------------------+
>>>>
>>>> Using the same logic, we copy parents' employment and mothers' BMI as desired:
>>>>
>>>> . gen BMI_M = .
>>>> (7 missing values generated)
>>>>
>>>> . gen Emp_M = .
>>>> (7 missing values generated)
>>>>
>>>> . gen Emp_F = .
>>>> (7 missing values generated)
>>>>
>>>> . summarize pid, meanonly
>>>>
>>>> . forval i = 1 / `r(max)' {
>>>>   2.     by ID_fam: replace BMI_M = BMI[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>>>>   3.     by ID_fam: replace Emp_M = Emp[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>>>>   4.     by ID_fam: replace Emp_F = Emp[`i'] if ID_F == ID[`i'] & !missing(ID_F)
>>>>   5. }
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (3 real changes made)
>>>> (3 real changes made)
>>>> (1 real change made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>> (0 real changes made)
>>>>
>>>>
>>>> Here are the results:
>>>>
>>>> . l
>>>>
>>>>      +-----------------------------------------------------------------------------------------------+
>>>>      |       ID_F         ID_M      BMI           ID     ID_fam   Emp
>>>>  pid    BMI_M   Emp_M   Emp_F |
>>>>      |-----------------------------------------------------------------------------------------------|
>>>>   1. |                           26.501   A901963701   A9019637     1
>>>>    1        .       .       . |
>>>>   2. |                           20.483   A901963702   A9019637     1
>>>>    2        .       .       . |
>>>>   3. | A901963701   A901963702   20.924   A901963703   A9019637     .
>>>>    3   20.483       1       1 |
>>>>   4. |                           27.209   A901963801   A9019638     1
>>>>    1        .       .       . |
>>>>   5. |                           31.733   A901963802   A9019638     .
>>>>    2        .       .       . |
>>>>      |-----------------------------------------------------------------------------------------------|
>>>>   6. | A901963801   A901963802   18.018   A901963803   A9019638     .
>>>>    3   31.733       .       1 |
>>>>   7. | A901963801   A901963802   19.054   A901963804   A9019638     .
>>>>    4   31.733       .       1 |
>>>>      +-----------------------------------------------------------------------------------------------+
>>>>
>>>> Nick
>>>>
>>>> On Fri, Feb 22, 2013 at 10:45 PM, Haena Lee <[email protected]> wrote:
>>>>
>>>>> I am working on investigating the relationship between maternal
>>>>> employment status and prevalence of childhood obesity using a
>>>>> nationally representative data (KNHANES). Suppose I have ID(all
>>>>> observations including both children and parents), ID_fam (household
>>>>> indicator),
>>>>> ID_F( father's ID), ID_M (mother's ID), BMI (body mass index) and
>>>>> finally Emp (employment status 1 if employed; 0 if non-employed) as
>>>>> the following;
>>>>>
>>>>> ID_F              ID_M           BMI                    ID                ID_fam       Emp
>>>>>                                                  26.501         A901963701       A9019637   1
>>>>>                                                  20.483         A901963702       A9019637   1
>>>>> A901963701      A901963702       20.924         A901963703       A9019637    .
>>>>>                                                  27.209         A901963801       A9019638   1
>>>>>                                                  31.733         A901963802       A9019638    .
>>>>> A901963801      A901963802      18.018            A901963803     A9019638    .
>>>>> A901963801      A901963802      19.054          A901963804       A9019638    .
>>>>>
>>>>> And ultimately, I would like to have a data set like this following;
>>>>>
>>>>> ID (children)   ID_fam         BMI        Mom's Bmi Mom's Emp   Dad's Emp
>>>>> A901963703  A9019637   20.924   20.483         1                    1
>>>>> A901963803  A9019638   18.018   31.733          .                     1
>>>>> A901963804  A9019638   19.054   31.733          .                     1
>>>>>
>>>>> Given this, my question is 1) how to map the properties of other
>>>>> family members to children within each household, using loop, or 2)
>>>>> how to generate an indicator of mother (1 if ID == ID_M; 0 otherwise)?
>>>>> I found Nick Cox's helpful example and imitated it as the following;
>>>>>
>>>>> by ID_fam (ID), sort: gen pid = _n
>>>>> gen byte fid = .
>>>>> gen byte mid = .
>>>>> summarize pid, meanonly
>>>>> forval i = 1 / `r(max)' {
>>>>>                 by ID_fam: replace fid = `i'
>>>>>                 if ID_F == ID[`i'] & !missing(ID_F)
>>>>>                 by ID_fam: replace mid = `i'
>>>>>                 if ID_M == ID[`i'] & !missing(ID_M)
>>>>> }
>>>>>
>>>>> And it didn't produce any meaningful values but missing. Please
>>>>> advise. Thank you so much for any help in advance.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index