# st: RE: stset and the NLSY97

 From "Nick Cox" To Subject st: RE: stset and the NLSY97 Date Mon, 17 Oct 2005 02:01:47 +0100

```Not my field, but your dummy calculation can
be put more succinctly:

gen sa = firstsex_yr <= age

However, safer would be to trap missings:

gem sa = cond(mi(firstsex_yr, age), ., firstsex_yr <= age)

Nick
n.j.cox@durham.ac.uk

Scott Cunningham

> I'm estimating a hazard model and had some basic questions.  The
> dataset I'm using is the NLSY97.  It's a panel consisting of six
> waves, and each year roughly 5500 individuals (after eliminating
> various observations).  The outcome that I'm interested in is the
> until they are 14, but when they are asked, they are asked at what
> age they first experienced vaginal intercourse, and that age
> oftentimes is prior to the year in which they were first asked about
> their sexuality (ie, earlier than 14).  So, I have, for all
> individuals, an integer corresponding to their age, in years, when
> they lost their virginity, or missing data for those who are still
> virgins.  After pulling the variables, I reshaped the data into a
> long panel.
>
> Thinking about the "stset" command, I decided to follow this route.
>
> * generate sexually active dummy equalling 1 if sexually active, and
> 0 otherwise
> gen sa=.
> replace sa=0 if firstsex_yr<age
> replace sa=1 if firstsex_yr==age
> replace sa=1 if firstsex_yr>age
>
> * stset the data
> stset age, failure(sa) id(id)
>
> where "age" is the age of the individual in any given year, and
> "firstsex_yr" is the age at which the individual first experienced
> vaginal intercourse.
>
> What I've basically done, though, is made the person's age to be my
> duration variable, but I don't think this is correct.  Ideally, I'd
> like to simply have some sort of year variable to be the duration
> variable, but the problem I'm imaginging is how to handle
> events that
> happened prior to the survey.  For instance, I know that some lost
> their virginity when they were 10, year that is at best 2
> years prior
> to the survey for some people, and 4 years prior to the survey for
> others.  So, it would seem that making "age" the duration
> variable is
> not the appropriate strategy, but I'm not sure of a better solution
> at this point.  Can someone provide me some suggestions on getting
> this data together?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```