Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: basic programming tips


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: basic programming tips
Date   Wed, 11 Oct 2006 19:57:48 +0100

The attachment of code promised here 
didn't make it, fortunately, so you are saved a sermon on 
the iniquity of attachments. 

Of your questions, I have answers to two. 

Nick 
n.j.cox@durham.ac.uk 

Scott Cunningham
 
> 1.  I am occasionally worried that I am replacing variables with  
> values that are incorrect.  In this example, it is easy to find  
> contradictions, though.  If someone is sexually active in an earlier  
> wave (say 1997) but then later reports that they are no longer  
> sexually active (say 2002), then it would mean the person 
> reported he  
> was not a virgin in 1997 but is a virgin in 2002.  How do others of  
> you check to make sure you do not have mistakes like this - once you  
> have already reshaped the data into a panel, for instance?  I 
> think I  
> do not possess enough of these checks in my programming, in 
> fact, and  
> am making many mistakes along the way that I'm not catching.

I don't want to start a discussion on Statalist on quite what
is virginity, but unfortunately you seem to need to define exactly 
what _you_ understand by it. I don't regard your example here 
as contradictory at all as long as virgin means here "not 
sexually active". Alternatively, if a person was ever previously 
sexually active, I do not see how they can revert to being 
a virgin (barring some legalistic redefinition). 

More generally, you can check for correctness if you independently
have correct answers or have some rule that guesses correct
answers for you (e.g. a majority vote). I don't see either here. 
 
> 3.  Finally, sexual activity has holes, as I said, which if 
> there are  
> no contradictions (like going from 0 to 1 over time), can be  
> corrected by filling all missing observations with a 0 or 1, 
> assuming  
> the first time a 1 appears is truly the first year the person made  
> their sexual debut.  What is the best way to fill in a missing value  
> in the context of this type of duration modeling?  I need to tell  
> Stata to make all missing observations a 0, unless a 1 had appeared  
> at some point earlier, in which case replace with a 1.

Again, going from 0 to 1 over time does not seem contradictory to me. 

The maximum of -sa- seen so far is just 

gen max_sa_sofar = . 
bysort id (year) : replace max_sa_sofar = max(sa, max_sa_sofar[_n-1]) 

The way that the -max()- function works is that -max(0,.)- is 0, -max(1,.)- 
is 1, etc., so that the usual rule that . is arbitrarily large
is set aside. (This is a feature not a bug.) 

This principle is implemented in the -egen- function -record()- from
-egenmore- on SSC, attributable to Kit Baum and S.B. Else. 

Thus you just need to copy across from this -max_sa_sofar- variable 
whenever -sa- is missing. That still leaves open for discussion whether 
this method of imputation is socially or sexually valid, as I doubt. 
 
> I've attached a copy of the code, so that one can know what I'm  
> describing if it's not clear.  The variables are "person  
> identification number," "year of survey," "sexual active," "age of  
> respondent at date of interview," "race," "number of partners  
> reported that year," and "marital status."

>         +-----------------------------------------+
>         |   id   year   sa   age   race   rp   ms |
>         |-----------------------------------------|
>      1. |    5   1997    1    15      1    2    0 |
>      2. |    5   1998    1    16      1    3    0 |
>      3. |    5   1999    .    17      1    0    0 |
>      4. |    5   2000    1    18      1    0    0 |
>      5. |    5   2001    1    19      1    .    0 |
>         |-----------------------------------------|
>      6. |    5   2002    1    20      1    4    0 |
>      7. |    9   1997    0    15      1    0    0 |
>      8. |    9   1998    .    16      1    0    0 |
>      9. |    9   1999    .    17      1    0    0 |
>     10. |    9   2000    0    18      1    0    0 |

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index