Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problems with expand og reverting to original dataset


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Problems with expand og reverting to original dataset
Date   Mon, 24 Jan 2011 09:41:45 +0000

Sorry, but not so. The groups defined by -x y- jointly are precisely
those described by -y x- jointly. Here (as typically) the observations
will come in a different order, but that's
immaterial once birth date is also looked at. (I had the same mother
and father as my brother, and precisely the same siblings are
identified if my father is put first.)

Otherwise put, if children share just one parent, they won't be in the
same -by:- group as defined by pairs of parents.

Nick

On Mon, Jan 24, 2011 at 9:30 AM, Ada Ma <[email protected]> wrote:
> Hi Nick,
>
> Thanks for pointing out my mistake. I'm thinking that OP's dataset might have some half siblings. Which is why putting the mother I'd in front of father's I'd solve the problem.
>
> Ada
>
> Sent using BlackBerry®
>
> -----Original Message-----
> From: Nick Cox <[email protected]>
> Sender: [email protected]
> Date: Mon, 24 Jan 2011 09:23:46
> To: <[email protected]>
> Reply-To: [email protected]: Re: st: Problems with expand og reverting to original dataset
>
> You wrote that the error message disappeared on using
>
> by mother_id father_id, sort
>
> instead of
>
> bysort mother_id father_id
>
> These two are equivalent. Whatever removed your error was some other
> change, I believe.
>
> Nick
>
> On Mon, Jan 24, 2011 at 9:11 AM, Grethe Søndergaard
> <[email protected]> wrote:
>> Thanks a lot to both of you for your explanations of how to handle my data.
>> I am using cox-regression and the bysort command is so much easier
>> than using expand as I intended to do. The error message disappeared
>> when I wrote by mother_id father_id, sort (instead of bysort mother_id
>> father_id).
>> I am aware that choosing only two siblings from each family might be
>> problematic and I will consider using reshape to include more
>> siblings.
>>
>>
>>
>>
>> 2011/1/20 Nick Cox <[email protected]>:
>>> Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote.
>>>
>>> My code was
>>>
>>> bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1]
>>>
>>> Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way.
>>>
>>> Ada wants to correct this to
>>>
>>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date
>>>
>>> For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to
>>>
>>> ... birth_date[3] - birth_date[2]
>>>
>>> As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing.
>>>
>>> Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work.
>>>
>>> Nick
>>> [email protected]
>>>
>>> Ada Ma
>>>
>>> WRT your Q to Nick the command you should write is:
>>>
>>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1]
>>> -birth_date
>>>
>>> [...]
>>>
>>> On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard
>>> <[email protected]> wrote:
>>>> Thank you for your answers
>>>>
>>>> @ Nick Cox: I have tried to run bysort mother_id father_id
>>>> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an
>>>> error message appear: "factor variables and time-series operators not
>>>> allowed". Can I solve this problem - by somehow changing the type of
>>>> variable that birth_date is?
>>>>
>>>> @ Ada Ma: My dataset consists of more than two siblings per family
>>>> (one line for each person). I am not sure how to find out which
>>>> siblings to be included in the dataset, if more than two siblings are
>>>> being compared. E.g. a family consists of children age 1, 4, and 8 (so
>>>> who should stay in the dataset). So that is why I choose only to
>>>> include persons with one siblings.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index