From
Kenisha Russell <kenisha.russell@framtidsstudier.se>

To
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>

Subject
st: re:re: data creation for hazard regression

Date
Mon, 11 Jun 2012 09:58:08 +0000

Hi Austin, I am using stata 10.1 Thank you for taking the time out to answer my question. I do apologise, if I was not clear, it was my first time posting. Research goal: Using a discrete time competing risks hazard model I would like to analyse entry into first union (i.e marriage or cohabitation), and pregnancy is one of the explanatory variables. The data has been transformed into person-months (i.e what I previously referred to as century-months data) Because I had the year and month in which each child was born, I then executed the steps outlined below: Step 1: I created childbearing histories /* Create century months for birth of each child (here maximum # of children is 3). using a loop running the code first for 1st, then 2nd, then 3rd child */ forval x = 1/3 { gen CMchild`x'=ym(childy`x', childm`x') recode CMchild`x'.=999999 } Step 2: Then in order to create a variable for pregnancy I. gen CMpregnancy=. forval x = 1/3 { replace CMpregnancy`x'=CMchild`x'-7 if CMchild`x'!=999999 replace CMpregnancy`x'==999999 if CMpregnancy`x'===. } After stset, and running the above commands my data currently looks like this. id _t0 _t _d _st _origin CMchild1 CMchild2 CMchild3 3 0 68 0 1 1997m6 583 999999 999999 4 75 278 0 1 1985m10 999999 999999 999999 11 476 0 1 1969m4 248 338 999999 12 258 0 1 1987m6 401 424 509 13 27 230 0 1 1989m10 421 999999 999999 14 0 198 0 1 1992m6 999999 999999 999999 15 68 86 1 1 1986m5 476 999999 999999 I have checked the math and the outcomes seem to be correct, for example for Id # 11 where CMchild==338, CMpregnancy1 calculated from 7 months before was at time 331. So my question is, if the above is correct, do I now need to stsplit the dataset so that there's one data row per person per month at risk of pregnancy? If I do split the event, if my reasoning is correct I assume I would need to stop each pregnancy at the point where each child is born. Is that correct? If so, how would I do that? You suggested that I created a contemporaneous time variable, can you explain how? Also With regards to your earlier question Austin: Are you sure every child is a biological child? Yes, I am sure all the children are biological Are there women with more than 3 children in the data? There are no women with more than 3 children in this data Do you have any information on gestational age at birth? I have no information about gestational age at birth. I hope that this time my goal and question is much clearer. Best, Kenisha * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: re:re: data creation for hazard regression***From:*Austin Nichols <austinnichols@gmail.com>

