Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: stset and the NLSY97

From   Marcello Pagano <>
Subject   Re: st: RE: stset and the NLSY97
Date   Sun, 16 Oct 2005 21:46:55 -0400

I do not understand your dilemma.  Assuming everyone is telling the truth,
what you seem to have is time to first sex is your outcome of interest with
the very Victorian identification of "death" as that time.  If someone is 17
at the time of the survey without having had sex, then that is a censored
observation.  So your "time" variable is firstsex_yr if sa==1
and age if sa==0.   So you need to generate a variable

gen time = age
replace time = firstsex_yr if sa==1
stset time , failure(sa)

Your hazard should be zero for  time < = 10, but that
depends on your data. You actually do have information back then, assuming
you have done a decent job of sampling and things have not changed
that much over the years. (By that I mean that if everyone you question
is over 12, say, then their experience in the 0 to 12 time period is
still representative of what is going in those years today.)

Hope this helps,


Scott Cunningham wrote:

On Oct 16, 2005, at 9:01 PM, Nick Cox wrote:

Not my field, but your dummy calculation can
be put more succinctly:

gen sa = firstsex_yr <= age

However, safer would be to trap missings:

gem sa = cond(mi(firstsex_yr, age), ., firstsex_yr <= age)



Thanks for helping make the dummies more succinct.

Do you think, though, that it is correct to use "age" as the actual duration variable? So, for instance, I have a long dataset like this:

id year age firstsex_yr
1 1997 15 .
1 1998 16 16
1 1999 17 16
1 2000 18 16
2 1997 12 .
2 1998 13 .
2 1999 14 11
2 2000 15 11
3 1997 16 12
3 1998 17 12
3 1999 18 12
3 2000 19 12

So, by stsetting the data as so:

. stset age, failure(sa)

where "sa" is an indicator equalling "1" if the person has become sexually active (signalling "death" in this context) and 0 otherwise. If I stset the data such that "age" is the duration, have I really made the right decision? Or should I use "year" or should have some other variable that I create to correspond to time that has passed? Because I really want to look at ten periods, initially - from 10 years to 19 years of age. It's a short duration, relatively speaking, and most "exits" occur at 15-17. So I don't actually have data for resopndents for those early, pre-survey, ages - ie, 10-12. So what's the best solution here? Do I create a variable, maybe "time" or "virgin_time", that takes on a value of 1 to 10, and that variable matches up to the years that are covered in the data, and the years not covered?

Is this post making sense? I'm mainly just not sure of the proper way to execute this stset command to make use of the information I have in the form I currently have it in.

* For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index