Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Problems with expand og reverting to original dataset
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Problems with expand og reverting to original dataset
Date
Thu, 20 Jan 2011 18:41:12 +0000
Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote.
My code was
bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1]
Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way.
Ada wants to correct this to
bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date
For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to
... birth_date[3] - birth_date[2]
As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing.
Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work.
Nick
[email protected]
Ada Ma
WRT your Q to Nick the command you should write is:
bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1]
-birth_date
[...]
On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard
<[email protected]> wrote:
> Thank you for your answers
>
> @ Nick Cox: I have tried to run bysort mother_id father_id
> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an
> error message appear: "factor variables and time-series operators not
> allowed". Can I solve this problem - by somehow changing the type of
> variable that birth_date is?
>
> @ Ada Ma: My dataset consists of more than two siblings per family
> (one line for each person). I am not sure how to find out which
> siblings to be included in the dataset, if more than two siblings are
> being compared. E.g. a family consists of children age 1, 4, and 8 (so
> who should stay in the dataset). So that is why I choose only to
> include persons with one siblings.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/