Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problems with expand og reverting to original dataset


From   "Ada Ma" <heu034@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Problems with expand og reverting to original dataset
Date   Mon, 24 Jan 2011 09:34:46 +0000

Ooops I made another mistake again. Don't mind me. Please ignore my last email.

Ada


Sent using BlackBerry®

-----Original Message-----
From: "Ada Ma" <heu034@googlemail.com>
Date: Mon, 24 Jan 2011 09:30:37 
To: <statalist@hsphsun2.harvard.edu>
Reply-To: heu034@googlemail.com
Subject: Re: st: Problems with expand og reverting to original dataset

Hi Nick, 

Thanks for pointing out my mistake. I'm thinking that OP's dataset might have some half siblings. Which is why putting the mother I'd in front of father's I'd solve the problem.

Ada

Sent using BlackBerry®

-----Original Message-----
From: Nick Cox <njcoxstata@gmail.com>
Sender: owner-statalist@hsphsun2.harvard.edu
Date: Mon, 24 Jan 2011 09:23:46 
To: <statalist@hsphsun2.harvard.edu>
Reply-To: statalist@hsphsun2.harvard.eduSubject: Re: st: Problems with expand og reverting to original dataset

You wrote that the error message disappeared on using

by mother_id father_id, sort

instead of

bysort mother_id father_id

These two are equivalent. Whatever removed your error was some other
change, I believe.

Nick

On Mon, Jan 24, 2011 at 9:11 AM, Grethe Søndergaard
<dkstatstata@gmail.com> wrote:
> Thanks a lot to both of you for your explanations of how to handle my data.
> I am using cox-regression and the bysort command is so much easier
> than using expand as I intended to do. The error message disappeared
> when I wrote by mother_id father_id, sort (instead of bysort mother_id
> father_id).
> I am aware that choosing only two siblings from each family might be
> problematic and I will consider using reshape to include more
> siblings.
>
>
>
>
> 2011/1/20 Nick Cox <n.j.cox@durham.ac.uk>:
>> Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote.
>>
>> My code was
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1]
>>
>> Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way.
>>
>> Ada wants to correct this to
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date
>>
>> For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to
>>
>> ... birth_date[3] - birth_date[2]
>>
>> As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing.
>>
>> Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work.
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Ada Ma
>>
>> WRT your Q to Nick the command you should write is:
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1]
>> -birth_date
>>
>> [...]
>>
>> On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard
>> <dkstatstata@gmail.com> wrote:
>>> Thank you for your answers
>>>
>>> @ Nick Cox: I have tried to run bysort mother_id father_id
>>> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an
>>> error message appear: "factor variables and time-series operators not
>>> allowed". Can I solve this problem - by somehow changing the type of
>>> variable that birth_date is?
>>>
>>> @ Ada Ma: My dataset consists of more than two siblings per family
>>> (one line for each person). I am not sure how to find out which
>>> siblings to be included in the dataset, if more than two siblings are
>>> being compared. E.g. a family consists of children age 1, 4, and 8 (so
>>> who should stay in the dataset). So that is why I choose only to
>>> include persons with one siblings.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index