Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: re:re: data creation for hazard regression


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: re:re: data creation for hazard regression
Date   Mon, 11 Jun 2012 12:24:47 -0400

Kenisha Russell <kenisha.russell@framtidsstudier.se>:
The first link I gave in
http://www.stata.com/statalist/archive/2012-06/msg00472.html
has very detailed advice on dealing with discrete time models:
https://www.iser.essex.ac.uk/resources/survival-analysis-with-stata-module-ec968

I still don't see why you replace missing birthdates with 999999.

If you -expand- as detailed in the web course linked above, you will wind
up with a very large dataset, so you may prefer to ignore the
discreteness of the data.

It is still not clear exactly what your analysis consists of--what are
the competing risks?
Marriage and cohabitation?

When is the onset of risk? Earliest possible age of marriage/cohab?
If you have each month from age 12 to age 50, you will have many observations,
so you will need to streamline the variables in order to keep the dataset size
small enough to work with.  Then you can define variables in terms of what
was true in that month for that person.

If you are using the discrete time models as above, you do not need to
-stset- etc. but to have each person have an observation in each month.
However: The data snippet you gave does not have that structure.

If you had State 11, you could use -stcrreg- but also findit stcompet or
http://www.statajournal.com/sjpdf.html?articlenum=st0059

On Mon, Jun 11, 2012 at 5:58 AM, Kenisha Russell
<kenisha.russell@framtidsstudier.se> wrote:
> Hi Austin,
>
> I am using stata 10.1
> Thank you for taking the time out to answer my question. I do apologise, if I was not clear, it was my first time posting.
>
> Research goal: Using a discrete time competing risks hazard model I would like to analyse entry into first union (i.e marriage or cohabitation), and pregnancy is one of the explanatory variables.
>
> The data has been transformed into person-months (i.e what I previously referred to as century-months data)
> Because I had the year and month in which each child was born, I then executed the steps outlined below:
> Step 1: I created childbearing histories
> /* Create century months for birth of each child (here maximum # of children is 3).
> using a loop running the code first for 1st, then 2nd, then 3rd child */
> forval x = 1/3 {
> gen CMchild`x'=ym(childy`x', childm`x')
> recode CMchild`x'.=999999
> }
>
>
> Step 2: Then in order to create a variable for pregnancy I.
> gen CMpregnancy=.
> forval x = 1/3 {
> replace CMpregnancy`x'=CMchild`x'-7 if CMchild`x'!=999999
> replace CMpregnancy`x'==999999 if CMpregnancy`x'===.
> }
>
> After stset, and running the above commands my data currently looks like this.
>
> id      _t0     _t      _d      _st     _origin CMchild1        CMchild2        CMchild3
> 3       0       68      0       1       1997m6  583     999999  999999
> 4       75      278     0       1       1985m10 999999  999999  999999
> 11      476     0       1       1969m4  248     338     999999
> 12      258     0       1       1987m6  401     424     509
> 13      27      230     0       1       1989m10 421     999999  999999
> 14      0       198     0       1       1992m6  999999  999999  999999
> 15      68      86      1       1       1986m5  476     999999  999999
>
> I have checked the math and the outcomes seem to be correct, for example for Id # 11  where CMchild==338, CMpregnancy1  calculated from 7 months before was at time 331.
> So my question is, if the above is correct, do I now need to stsplit the dataset so that there's one data row per person per month at risk of pregnancy?
>
> If I do split the event, if my reasoning is correct I assume I would need to stop each pregnancy at the point where each child is born.
>  Is that correct? If so, how would I do that? You suggested that I created a contemporaneous time variable, can you explain how?
>
> Also With regards to your earlier question Austin:
> Are you sure every child is a biological child? Yes, I am sure all the children are biological
> Are there women with more than 3 children in the data? There are no women with more than 3 children in this data
> Do you have any information on gestational age at birth? I have no information about gestational age at birth.
>
>
> I hope that this time my goal and question is much clearer.
> Best,
> Kenisha

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index