Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: missing events in stset


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: missing events in stset
Date   Fri, 04 May 2007 10:45:43 -0500

Sara Mottram <s.mottram@cphc.keele.ac.uk> writes, 

> I am having some difficulty with -stset-. I'm almost certain that the 
> fault lies with my data, as this same command has worked before in a 
> similar dataset. However, I wonder if anyone could give me an idea as to 
> where I might start looking to find the problem.
> 
> [...]
>
> [...] I know from a tabulation of the data that there are 734 
> consultations, but when I use -stset- it identifies 730 events. One 
> person consults at time 0, so I think this person is being ignored - I 
> understand this. However, this still leaves three events that are 
> unidentified.
> 
> [...]

And Sara included the following output:

        -------------------------------------------------------------------
        . stset cons_dt, id(surveyid) fail(kcons_post_3yr==1) origin(time 
        > edateass) exit(time censor_date)

                      id:  surveyid
           failure event:  kcons_post_3yr == 1
        obs. time interval:  (cons_dt[_n-1], cons_dt]
        exit on or before:  time censor_date
          t for analysis:  (time-origin)
                  origin:  time edateass

        ---------------------------------------------------------- 

          16704  total obs.
             28  obs. end on or before enter()
        ---------------------------------------------------------- 

          16676  obs. remaining, representing
            742  subjects
            730  failures in multiple failure-per-subject data
         703420  total analysis time at risk, at risk from t =         0
                                   earliest observed entry t =         0
                                        last observed exit t =      1096
        -------------------------------------------------------------------


First, Sara, notice how Stata writes the time interval:

        obs. time interval:  (cons_dt[_n-1], cons_dt]

That is ( meaning open interval and ] meaning closed interval.  Hence, 
a subject with the interval (0,0] makes no sense.  That subject failed 
before he or she entered.

Do you have other examples like this.  Do you, perhaps, have someone else 
with interval (12,12] or (20,20]?  That would be the same story.

Note that -stset- reported 

             28  obs. end on or before enter()

so Sara must have obs like (12,12] or (20,20], or she has more obvious 
errors such as (20,12].

Assuming the problems are all of the form (12,12] and (20,20], I would do 
the following:

        . replace censor_date = censor_date + .125

and try again.  I'm assuming that Sara's dates are all integers and so 
moving all the censoring dates forward just a little won't matter.
There's nothing magic about .125; Sara could use .0625 or .03125 or even, 
say .00390625.  Or .1, .01, .001, etc.  The only reason I don't use nice 
numbers like .1, and .01 is that binary computers cannot store exactly 
negative powers of 10, and so later, I cannot type things like 

        . list if censor_date==12.1

I have to type things like 

        . list if censor_date==float(12.1)

and I invariably forget, so I use negative powers of 2 to shift dates.

Anyway, perhaps moving the end dates forward just a little will solve the
problem.

Or maybe not.  Sara has lots of dates in her files.  Quoting from the output 
again:

        obs. time interval:  (cons_dt[_n-1], cons_dt]
        exit on or before:  time censor_date
          t for analysis:  (time-origin)
                  origin:  time edateass

So we need to look at cons_dt as well.  And we need to look censor_date and
edateass carefully, because Sara has multiple records per subject.

I would do the following:

        . sort surveyid cons_dt

                                // make sure dates are growing
        . by surveyid: assert cons_dt > cons_dt[_n-1] if _n>1

                                // make sure censor_date is constant
        . by surveyid: assert censor_date == censor_date[1]

                                // make sure edateass is constant
        . by surveyid: assert edateass == edateass[1]

                                // make sure censor_date after enter date
        . by surveyid: assert censor_date > cons_dt[1]

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index