Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: basic programming tips

From   Scott Cunningham <>
Subject   Re: st: RE: basic programming tips
Date   Wed, 11 Oct 2006 15:26:44 -0400


Thanks for the response. I'm going over it carefully, but I wanted to quickly clarify something. The contradictions that I'm worried about are not going from 0 to 1, but rather going from 1 to 0 - which is impossible, given that the nature of the event I'm describing (e.g., did the person ever have vaginal intercourse with a member of the opposite sex). This would be straightforward if there was just one question to appeal to, but unfortunately, the way the NLSY97 is set up, that simple question is asked in a variety of different ways to the 9000 different respondents, depending on their answers to many other questions.

I'm reading more closely your recommendations now. Just wanted to clarify that point about the contradiction.


On Oct 11, 2006, at 2:57 PM, Nick Cox wrote:

The attachment of code promised here
didn't make it, fortunately, so you are saved a sermon on
the iniquity of attachments.

Of your questions, I have answers to two.


Scott Cunningham

1.  I am occasionally worried that I am replacing variables with
values that are incorrect.  In this example, it is easy to find
contradictions, though.  If someone is sexually active in an earlier
wave (say 1997) but then later reports that they are no longer
sexually active (say 2002), then it would mean the person
reported he
was not a virgin in 1997 but is a virgin in 2002.  How do others of
you check to make sure you do not have mistakes like this - once you
have already reshaped the data into a panel, for instance?  I
think I
do not possess enough of these checks in my programming, in
fact, and
am making many mistakes along the way that I'm not catching.
I don't want to start a discussion on Statalist on quite what
is virginity, but unfortunately you seem to need to define exactly
what _you_ understand by it. I don't regard your example here
as contradictory at all as long as virgin means here "not
sexually active". Alternatively, if a person was ever previously
sexually active, I do not see how they can revert to being
a virgin (barring some legalistic redefinition).

More generally, you can check for correctness if you independently
have correct answers or have some rule that guesses correct
answers for you (e.g. a majority vote). I don't see either here.

3.  Finally, sexual activity has holes, as I said, which if
there are
no contradictions (like going from 0 to 1 over time), can be
corrected by filling all missing observations with a 0 or 1,
the first time a 1 appears is truly the first year the person made
their sexual debut.  What is the best way to fill in a missing value
in the context of this type of duration modeling?  I need to tell
Stata to make all missing observations a 0, unless a 1 had appeared
at some point earlier, in which case replace with a 1.
Again, going from 0 to 1 over time does not seem contradictory to me.

The maximum of -sa- seen so far is just

gen max_sa_sofar = .
bysort id (year) : replace max_sa_sofar = max(sa, max_sa_sofar[_n-1])

The way that the -max()- function works is that -max(0,.)- is 0, - max(1,.)-
is 1, etc., so that the usual rule that . is arbitrarily large
is set aside. (This is a feature not a bug.)

This principle is implemented in the -egen- function -record()- from
-egenmore- on SSC, attributable to Kit Baum and S.B. Else.

Thus you just need to copy across from this -max_sa_sofar- variable
whenever -sa- is missing. That still leaves open for discussion whether
this method of imputation is socially or sexually valid, as I doubt.

I've attached a copy of the code, so that one can know what I'm
describing if it's not clear.  The variables are "person
identification number," "year of survey," "sexual active," "age of
respondent at date of interview," "race," "number of partners
reported that year," and "marital status."

        |   id   year   sa   age   race   rp   ms |
     1. |    5   1997    1    15      1    2    0 |
     2. |    5   1998    1    16      1    3    0 |
     3. |    5   1999    .    17      1    0    0 |
     4. |    5   2000    1    18      1    0    0 |
     5. |    5   2001    1    19      1    .    0 |
     6. |    5   2002    1    20      1    4    0 |
     7. |    9   1997    0    15      1    0    0 |
     8. |    9   1998    .    16      1    0    0 |
     9. |    9   1999    .    17      1    0    0 |
    10. |    9   2000    0    18      1    0    0 |
*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index