Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: re:re: data creation for hazard regression

From   Kenisha Russell <>
To   "" <>
Subject   st: re:re: data creation for hazard regression
Date   Mon, 11 Jun 2012 09:58:08 +0000

Hi Austin,

I am using stata 10.1 
Thank you for taking the time out to answer my question. I do apologise, if I was not clear, it was my first time posting. 

Research goal: Using a discrete time competing risks hazard model I would like to analyse entry into first union (i.e marriage or cohabitation), and pregnancy is one of the explanatory variables. 

The data has been transformed into person-months (i.e what I previously referred to as century-months data)
Because I had the year and month in which each child was born, I then executed the steps outlined below:
Step 1: I created childbearing histories
/* Create century months for birth of each child (here maximum # of children is 3).
using a loop running the code first for 1st, then 2nd, then 3rd child */
forval x = 1/3 {
gen CMchild`x'=ym(childy`x', childm`x')
recode CMchild`x'.=999999

Step 2: Then in order to create a variable for pregnancy I.
gen CMpregnancy=.
forval x = 1/3 {
replace CMpregnancy`x'=CMchild`x'-7 if CMchild`x'!=999999
replace CMpregnancy`x'==999999 if CMpregnancy`x'===.

After stset, and running the above commands my data currently looks like this. 

id	_t0	_t	_d	_st	_origin	CMchild1	CMchild2	CMchild3
3	0	68	0	1	1997m6	583	999999	999999
4	75	278	0	1	1985m10	999999	999999	999999
11	476	0	1	1969m4	248	338	999999
12	258	0	1	1987m6	401	424	509
13	27	230	0	1	1989m10	421	999999	999999
14	0	198	0	1	1992m6	999999	999999	999999
15	68	86	1	1	1986m5	476	999999	999999

I have checked the math and the outcomes seem to be correct, for example for Id # 11  where CMchild==338, CMpregnancy1  calculated from 7 months before was at time 331. 
So my question is, if the above is correct, do I now need to stsplit the dataset so that there's one data row per person per month at risk of pregnancy? 

If I do split the event, if my reasoning is correct I assume I would need to stop each pregnancy at the point where each child is born. 
 Is that correct? If so, how would I do that? You suggested that I created a contemporaneous time variable, can you explain how? 

Also With regards to your earlier question Austin:
Are you sure every child is a biological child? Yes, I am sure all the children are biological
Are there women with more than 3 children in the data? There are no women with more than 3 children in this data
Do you have any information on gestational age at birth? I have no information about gestational age at birth.

I hope that this time my goal and question is much clearer.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index