Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Ada Ma" <heu034@googlemail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Problems with expand og reverting to original dataset |
Date | Mon, 24 Jan 2011 09:30:37 +0000 |
Hi Nick, Thanks for pointing out my mistake. I'm thinking that OP's dataset might have some half siblings. Which is why putting the mother I'd in front of father's I'd solve the problem. Ada Sent using BlackBerry® -----Original Message----- From: Nick Cox <njcoxstata@gmail.com> Sender: owner-statalist@hsphsun2.harvard.edu Date: Mon, 24 Jan 2011 09:23:46 To: <statalist@hsphsun2.harvard.edu> Reply-To: statalist@hsphsun2.harvard.eduSubject: Re: st: Problems with expand og reverting to original dataset You wrote that the error message disappeared on using by mother_id father_id, sort instead of bysort mother_id father_id These two are equivalent. Whatever removed your error was some other change, I believe. Nick On Mon, Jan 24, 2011 at 9:11 AM, Grethe Søndergaard <dkstatstata@gmail.com> wrote: > Thanks a lot to both of you for your explanations of how to handle my data. > I am using cox-regression and the bysort command is so much easier > than using expand as I intended to do. The error message disappeared > when I wrote by mother_id father_id, sort (instead of bysort mother_id > father_id). > I am aware that choosing only two siblings from each family might be > problematic and I will consider using reshape to include more > siblings. > > > > > 2011/1/20 Nick Cox <n.j.cox@durham.ac.uk>: >> Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote. >> >> My code was >> >> bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1] >> >> Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way. >> >> Ada wants to correct this to >> >> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date >> >> For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to >> >> ... birth_date[3] - birth_date[2] >> >> As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing. >> >> Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work. >> >> Nick >> n.j.cox@durham.ac.uk >> >> Ada Ma >> >> WRT your Q to Nick the command you should write is: >> >> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] >> -birth_date >> >> [...] >> >> On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard >> <dkstatstata@gmail.com> wrote: >>> Thank you for your answers >>> >>> @ Nick Cox: I have tried to run bysort mother_id father_id >>> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an >>> error message appear: "factor variables and time-series operators not >>> allowed". Can I solve this problem - by somehow changing the type of >>> variable that birth_date is? >>> >>> @ Ada Ma: My dataset consists of more than two siblings per family >>> (one line for each person). I am not sure how to find out which >>> siblings to be included in the dataset, if more than two siblings are >>> being compared. E.g. a family consists of children age 1, 4, and 8 (so >>> who should stay in the dataset). So that is why I choose only to >>> include persons with one siblings. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/