| 
    
 |   | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: basic programming tips
respondents questions about sexual behavior every year.  The survey  
uses extensive skipping and branching requiring the researcher to  
search over the various branches and collect the information.  I am  
trying to determine the proportion of people by sex, race and age in  
the survey who reportedly were sexually active ("sa") at any point in  
their life prior to the survey.  Because of the fact that sometimes  
people do not receive a given question based on how they answered  
earlier questions or how they answered the same questions in a  
previous years, there are holes in my data.  I have two questions.
1.  I am occasionally worried that I am replacing variables with  
values that are incorrect.  In this example, it is easy to find  
contradictions, though.  If someone is sexually active in an earlier  
wave (say 1997) but then later reports that they are no longer  
sexually active (say 2002), then it would mean the person reported he  
was not a virgin in 1997 but is a virgin in 2002.  How do others of  
you check to make sure you do not have mistakes like this - once you  
have already reshaped the data into a panel, for instance?  I think I  
do not possess enough of these checks in my programming, in fact, and  
am making many mistakes along the way that I'm not catching.
2.  The NLSY97 has a very difficult skip structure, and for many of  
the questions I am interested in, I must comb over the questions  
carefully and make sure that I am accounting for every one.  For  
those of you who work frequently with surveys that have elaborate  
skip and branching patterns, how do you efficiently manage the code  
such that you can be assured you have not lost people along the way,  
or just replaced over values accidentally.
3.  Finally, sexual activity has holes, as I said, which if there are  
no contradictions (like going from 0 to 1 over time), can be  
corrected by filling all missing observations with a 0 or 1, assuming  
the first time a 1 appears is truly the first year the person made  
their sexual debut.  What is the best way to fill in a missing value  
in the context of this type of duration modeling?  I need to tell  
Stata to make all missing observations a 0, unless a 1 had appeared  
at some point earlier, in which case replace with a 1.
I've attached a copy of the code, so that one can know what I'm  
describing if it's not clear.  The variables are "person  
identification number," "year of survey," "sexual active," "age of  
respondent at date of interview," "race," "number of partners  
reported that year," and "marital status."
sc
       +-----------------------------------------+
       |   id   year   sa   age   race   rp   ms |
       |-----------------------------------------|
    1. |    5   1997    1    15      1    2    0 |
    2. |    5   1998    1    16      1    3    0 |
    3. |    5   1999    .    17      1    0    0 |
    4. |    5   2000    1    18      1    0    0 |
    5. |    5   2001    1    19      1    .    0 |
       |-----------------------------------------|
    6. |    5   2002    1    20      1    4    0 |
    7. |    9   1997    0    15      1    0    0 |
    8. |    9   1998    .    16      1    0    0 |
    9. |    9   1999    .    17      1    0    0 |
   10. |    9   2000    0    18      1    0    0 |
       |-----------------------------------------|
   11. |    9   2001    0    19      1    0    0 |
   12. |    9   2002    1    20      1    1    0 |
   13. |   10   1997    .    14      1    0    0 |
   14. |   10   1998    .    15      1    0    0 |
   15. |   10   1999    0    16      1    0    0 |
       |-----------------------------------------|
   16. |   10   2000    0    17      1    0    0 |
   17. |   10   2001    0    18      1    0    0 |
   18. |   10   2002    0    19      1    0    0 |
   19. |   18   1997    1    15      2    1    0 |
   20. |   18   1998    1    16      2   99    0 |
       |-----------------------------------------|
   21. |   18   1999    1    17      2    3    0 |
   22. |   18   2000    1    18      2    5    0 |
   23. |   18   2001    .    19      2    .    0 |
   24. |   18   2002    1    20      2   10    0 |
   25. |   19   1997    .    12      2    0    0 |
       |-----------------------------------------|
   26. |   19   1998    .    13      2    0    0 |
   27. |   19   1999    0    14      2    0    0 |
   28. |   19   2000    1    15      2    4    0 |
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/