Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problems with expand og reverting to original dataset


From   Grethe Søndergaard <[email protected]>
To   [email protected]
Subject   Re: st: Problems with expand og reverting to original dataset
Date   Mon, 24 Jan 2011 10:11:12 +0100

Thanks a lot to both of you for your explanations of how to handle my data.
I am using cox-regression and the bysort command is so much easier
than using expand as I intended to do. The error message disappeared
when I wrote by mother_id father_id, sort (instead of bysort mother_id
father_id).
I am aware that choosing only two siblings from each family might be
problematic and I will consider using reshape to include more
siblings.




2011/1/20 Nick Cox <[email protected]>:
> Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote.
>
> My code was
>
> bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1]
>
> Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way.
>
> Ada wants to correct this to
>
> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date
>
> For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to
>
> ... birth_date[3] - birth_date[2]
>
> As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing.
>
> Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work.
>
> Nick
> [email protected]
>
> Ada Ma
>
> WRT your Q to Nick the command you should write is:
>
> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1]
> -birth_date
>
> [...]
>
> On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard
> <[email protected]> wrote:
>> Thank you for your answers
>>
>> @ Nick Cox: I have tried to run bysort mother_id father_id
>> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an
>> error message appear: "factor variables and time-series operators not
>> allowed". Can I solve this problem - by somehow changing the type of
>> variable that birth_date is?
>>
>> @ Ada Ma: My dataset consists of more than two siblings per family
>> (one line for each person). I am not sure how to find out which
>> siblings to be included in the dataset, if more than two siblings are
>> being compared. E.g. a family consists of children age 1, 4, and 8 (so
>> who should stay in the dataset). So that is why I choose only to
>> include persons with one siblings.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index