I do not understand your dilemma. Assuming everyone is telling the truth, what you seem to have is time to first sex is your outcome of interest with the very Victorian identification of "death" as that time. If someone is 17 at the time of the survey without having had sex, then that is a censored observation. So your "time" variable is firstsex_yr if sa==1 and age if sa==0. So you need to generate a variable gen time = age replace time = firstsex_yr if sa==1 stset time , failure(sa) Your hazard should be zero for time < = 10, but that depends on your data. You actually do have information back then, assuming you have done a decent job of sampling and things have not changed that much over the years. (By that I mean that if everyone you question is over 12, say, then their experience in the 0 to 12 time period is still representative of what is going in those years today.) Hope this helps, m.p. Scott Cunningham wrote:

On Oct 16, 2005, at 9:01 PM, Nick Cox wrote:Not my field, but your dummy calculation can be put more succinctly: gen sa = firstsex_yr <= age However, safer would be to trap missings: gem sa = cond(mi(firstsex_yr, age), ., firstsex_yr <= age) Nick

Nick,

Thanks for helping make the dummies more succinct.

Do you think, though, that it is correct to use "age" as the actual duration variable? So, for instance, I have a long dataset like this:

id year age firstsex_yr

1 1997 15 .

1 1998 16 16

1 1999 17 16

1 2000 18 16

2 1997 12 .

2 1998 13 .

2 1999 14 11

2 2000 15 11

3 1997 16 12

3 1998 17 12

3 1999 18 12

3 2000 19 12

So, by stsetting the data as so:

. stset age, failure(sa)

where "sa" is an indicator equalling "1" if the person has become sexually active (signalling "death" in this context) and 0 otherwise. If I stset the data such that "age" is the duration, have I really made the right decision? Or should I use "year" or should have some other variable that I create to correspond to time that has passed? Because I really want to look at ten periods, initially - from 10 years to 19 years of age. It's a short duration, relatively speaking, and most "exits" occur at 15-17. So I don't actually have data for resopndents for those early, pre-survey, ages - ie, 10-12. So what's the best solution here? Do I create a variable, maybe "time" or "virgin_time", that takes on a value of 1 to 10, and that variable matches up to the years that are covered in the data, and the years not covered?

Is this post making sense? I'm mainly just not sure of the proper way to execute this stset command to make use of the information I have in the form I currently have it in.

scott

