Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: re: data creation for hazard regression |

Date |
Fri, 8 Jun 2012 11:16:17 -0400 |

Kenisha Russell <kenisha.russell@framtidsstudier.se>: You will get better answers if you describe your data better--do you have monthly observations on women? What transitions are you observing? Labor market? Education? If you have data measured once monthly, you are probably better off turning the data into person-month observations and using a discrete-time hazard model; see e.g. https://www.iser.essex.ac.uk/resources/survival-analysis-with-stata-module-ec968 http://fmwww.bc.edu/repec/bocode/h/hshaz.html http://fmwww.bc.edu/repec/bocode/p/pgmhaz8.html In any case, before you do any data work, you should replace with missing dates that are out of range: replace CMchild1=. if CMchild1==999999 Next consider what subtracting 7 from a date in the format 201003 might mean--is that what you mean by "century month format" perhaps? 200996 is not the answer you want, I assume! But perhaps you have a proper date variable and you mean you have applied a display format such as format d %tm_CCYY_Mon Just make sure you know what values are encoded in the variable, and not just how they display. If you arrange your data as person-month observations, and create a date variable "now" measuring contemporaneous time, and 3 date variables "born1,born2,born3" for months of birth, then you can generate a pregnant dummy like so: g pregnant=0 forv i=1/3 { replace pregnant=1 if inrange(now,born`i'-7,born`i') } bearing in mind there will be some considerable measurement error in the pregnant variable. Are you sure every child is a biological child? Are there women with more than 3 children in the data? Do you have any information on gestational age at birth? If you rewrite your question, please take some time to make it clearer; phrases like "the likelihood of pregnancy is also 3" just confuse the reader and lower the probability of your getting a useful answer: http://blog.stata.com/2010/12/14/how-to-successfully-ask-a-question-on-statalist/ On Fri, Jun 8, 2012 at 4:50 AM, Kenisha Russell <kenisha.russell@framtidsstudier.se> wrote: > Hi Statalisters, > I am trying to create a data set for which I will use hazard regression (events history analysis to demographers). > I am currently restructuring my data into person-period format, in order to use hazard regression to examine the propensity of an individual to transition from state x to state y. > and one of the variables that I want to use is pregnancy. > > Because I have the day and month each child was born, after making this date into century month format, I have simply subtracted the 7 months previous to the birth of each child to obtain a variable called pregnancy. In this particular data set the highest recorded parity is 3. See the syntax I have used below. > > gen CMpregnancy1=. > replace CMpregnancy1=CMchild1-7 if CMchild1!=999999 > CMchild is the birthdate of the each child is in century month format. > > > After this I then split the data: > stsplit pregnancy1, after(CMpregnancy1) at(0) > > /* We replace values for pregnancy1 so that 0 represents time before that > the woman was pregnant and 1 for after the pregnancy*/ > replace pregnancy1= pregnancy1+1 > replace pregnancy1=0 if CMpregnancy1==. > list pid-_st CMpregnancy* pregnancy* in 1/60 > > This is repeated three times because given the fact that highest parity is = 3, the likelihood of pregnancy is also 3 and all should be taken into account. > > Although I have written a syntax here and have split the data, my issue is that I am not sure it is correct. Am I required to split the data with each pregnancy? i.e to create a time = before the event (i.e the pregnancy). > > If I do split the event, if my reasoning is correct I assume I would need to stop each pregnancy at the point where each child is born. Is that correct? If so, How would I do that? > > Best, > Kenisha * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: re: data creation for hazard regression***From:*Kenisha Russell <kenisha.russell@framtidsstudier.se>

- Prev by Date:
**Re: st: gllamm missing cut points** - Next by Date:
**Re: st: Changing the scale of a graph** - Previous by thread:
**st: re: data creation for hazard regression** - Next by thread:
**st: Problem with 'outreg'** - Index(es):