Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problems with expand og reverting to original dataset


From   Ada Ma <[email protected]>
To   [email protected]
Subject   Re: st: Problems with expand og reverting to original dataset
Date   Thu, 20 Jan 2011 13:28:09 +0000

How does your original dataset looks like?

One line of data per person or one line of data per pair of siblings?
Do you always have 2 siblings or do you have more than 2 siblings?

If you started out with one line per person, I would use -reshape- by
(mother_id father_id) from long to wide format - this is to create a
family level data set.  So one line will contain data for all
siblings, everyone's birthdays.

This way you can work out the birthdate differences, after you have
kept the families that you need you can -reshape- again.  If you want
to create a pairwise dataset, you should pick an index sibling, apply
the -expand- and then -reshape-.




On Thu, Jan 20, 2011 at 1:05 PM, Grethe Søndergaard
<[email protected]> wrote:
> Hello
>
> My dataset consists of siblings (two siblings from each family).
>
> person_id1
> mother_id
> father_id
> sibling_id
> birth date
>
>
> I only want to keep siblings in my dataset with a certain age
> difference, lets say 5 years or less.
> I have tried to do this using expand but when I am not sure how to
> revert to my original dataset. This is what I have done:
>
> ** duplicating each observation [no] times
> expand no, gen(dub)
>
> ** creating an (arbitary) id to keep track of the duplicates
> sort mother_id father_id person_id1
> by mother_id father_id person_id1: gen copyid = _n
>
> ** copying
> sort mother_id father_id
> by mother_id father_id: gen person_id2 = person_id1[no*copyid]
>
> ** adding age difference variable
> by mother_id father_id: gen birthdate_2  = birthdate_1[no*copyid]
>
> ** deleting redundant copies
> drop if person_id1 >= person_id2
>
> * creating age differences
> gen agedif = birthdate_1 - birthdate_2
> replace agedif=agedif *-1 if agedif<0
> keep if agedif<6
>
> So far so good. My dataset now consits of pairs of siblings with an
> age difference of less than 6 years. There are no dublicates. But how
> do I revert to my original dataset??
> (I have tried to keep if dub==0 but that deleted a lot of observations
> - and that shouldn't happen since there were no dublicates)
>
> I hope somebody can help since I am stuck with this problem.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Ada Ma
Research Fellow
Health Economics Research Unit
University of Aberdeen, UK.
http://www.abdn.ac.uk/heru/
Tel: +44 (0) 1224 555189
Fax: +44 (0) 1224 550926

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index