Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: stset and the NLSY97


From   Scott Cunningham <scunning@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: stset and the NLSY97
Date   Sun, 16 Oct 2005 20:39:44 -0400

I'm estimating a hazard model and had some basic questions. The dataset I'm using is the NLSY97. It's a panel consisting of six waves, and each year roughly 5500 individuals (after eliminating various observations). The outcome that I'm interested in is the exit from virginity. Individuals are not asked questions about sex until they are 14, but when they are asked, they are asked at what age they first experienced vaginal intercourse, and that age oftentimes is prior to the year in which they were first asked about their sexuality (ie, earlier than 14). So, I have, for all individuals, an integer corresponding to their age, in years, when they lost their virginity, or missing data for those who are still virgins. After pulling the variables, I reshaped the data into a long panel.

Thinking about the "stset" command, I decided to follow this route.

* generate sexually active dummy equalling 1 if sexually active, and 0 otherwise
gen sa=.
replace sa=0 if firstsex_yr<age
replace sa=1 if firstsex_yr==age
replace sa=1 if firstsex_yr>age

* stset the data
stset age, failure(sa) id(id)

where "age" is the age of the individual in any given year, and "firstsex_yr" is the age at which the individual first experienced vaginal intercourse.

What I've basically done, though, is made the person's age to be my duration variable, but I don't think this is correct. Ideally, I'd like to simply have some sort of year variable to be the duration variable, but the problem I'm imaginging is how to handle events that happened prior to the survey. For instance, I know that some lost their virginity when they were 10, year that is at best 2 years prior to the survey for some people, and 4 years prior to the survey for others. So, it would seem that making "age" the duration variable is not the appropriate strategy, but I'm not sure of a better solution at this point. Can someone provide me some suggestions on getting this data together?

thanks ahead of time,


scott cunningham
univ. of georgia
dept. of economics
athens, ga
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index