st: RE: stset and the NLSY97

Mon, 17 Oct 2005 02:01:47 +0100

Not my field, but your dummy calculation can be put more succinctly: gen sa = firstsex_yr <= age However, safer would be to trap missings: gem sa = cond(mi(firstsex_yr, age), ., firstsex_yr <= age) Nick n.j.cox@durham.ac.uk Scott Cunningham > I'm estimating a hazard model and had some basic questions. The > dataset I'm using is the NLSY97. It's a panel consisting of six > waves, and each year roughly 5500 individuals (after eliminating > various observations). The outcome that I'm interested in is the > exit from virginity. Individuals are not asked questions about sex > until they are 14, but when they are asked, they are asked at what > age they first experienced vaginal intercourse, and that age > oftentimes is prior to the year in which they were first asked about > their sexuality (ie, earlier than 14). So, I have, for all > individuals, an integer corresponding to their age, in years, when > they lost their virginity, or missing data for those who are still > virgins. After pulling the variables, I reshaped the data into a > long panel. > > Thinking about the "stset" command, I decided to follow this route. > > * generate sexually active dummy equalling 1 if sexually active, and > 0 otherwise > gen sa=. > replace sa=0 if firstsex_yr<age > replace sa=1 if firstsex_yr==age > replace sa=1 if firstsex_yr>age > > * stset the data > stset age, failure(sa) id(id) > > where "age" is the age of the individual in any given year, and > "firstsex_yr" is the age at which the individual first experienced > vaginal intercourse. > > What I've basically done, though, is made the person's age to be my > duration variable, but I don't think this is correct. Ideally, I'd > like to simply have some sort of year variable to be the duration > variable, but the problem I'm imaginging is how to handle > events that > happened prior to the survey. For instance, I know that some lost > their virginity when they were 10, year that is at best 2 > years prior > to the survey for some people, and 4 years prior to the survey for > others. So, it would seem that making "age" the duration > variable is > not the appropriate strategy, but I'm not sure of a better solution > at this point. Can someone provide me some suggestions on getting > this data together? * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

