Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
Date   Mon, 14 Mar 2011 19:37:27 -0400

Kathleen-

In addition to Laura's good advice, I'd add:

To exclude periods not-self employed, just add the option "if(selfemploy==0)" after the comma in your -stset- statement.  There are probably other issues in the -stset- statement, but that will give you a start.

I also noticed the following:

In your sample data, the two "failures" take place in intervals in which your sample person is _not_ self-employed.  If this is a real data point and you exclude intervals of non-self-employment, neither of these failures will show up!  You must assign "failure" to a time period when the person is self-employed. 

I think that this is part of a larger issue, that the year-intervals are wide. It's tempting to treat the data as grouped, because you know that the change in status took place sometime in the year.  The problem with this is: what do you do with  people who go from not-self-employed to self-employed in a year? Setting self-employed=1 gives them too little exposure; setting self-employed =0 gives them too much.  The classification error is minimized if: 1) spells of self-employment are long compared to a year, (but this doesn't seem to be the case here); or 2) more refined times of status change are available. Do you have dates of status change?  Month or even quarter would be better time units than the years.

Another question is your choice of time scale. Do you have a natural time zero? Such would be the case, for example, if your sample consisted people just starting their full-time working lives. Or, if people in the sample are of similar ages in 1989, then that is a natural start date. Suppose, however, that some of them have a prior employment history. Then your assigment of time zero as the first observed time self-employed *while in the study* seems arbitrary.  It would be helpful to have covariates the summarize prior employment history.

Another choice is to restart the clock at t0=0 at the start of each spell of self-employment. See, for example, Section 3.2.4 of the FAQ that Laura referenced. http://www.stata.com/support/faqs/stat/stmfail.html.
The choice is really governed by your overall study question and sample. No matter what time scale you choose, others (current age, calendar time) can be considered as covariates.


Steve


Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:   206-202-4783 
[email protected]





On Mar 13, 2011, at 11:31 AM, Kathleen Bui wrote:

My question is how to stset a multiple failure data set when an individual can 
move in and out of the risk set. 
 
I have read Cleves’s An Introduction to Survival Analysis Using Stata, Cleve’s  
STB-49, and all previous posts concerning st-setting multiple failures. Others 
have asked similar questions as mine, but I have yet to find a solution that 
works.  
 
I am analyzing the duration of an individual’s stay in Self-Employment. Failure 
will be exit from self-employment.  My question is how can I stset the data so 
that Stata recognizes that an individual can move into and out of the risk set 
(which is being Self-Employed).
 
To be more explicit, for each individual in my data set, I have information as 
to whether or not they are Self-Employed.  The issue arises when an individual 
has a self employment history as follows:
 
The individual is self-employed and therefore at risk of failure.  Then they 
fail (leave self employment) and enter waged employment. By entering waged 
employment, they are no longer at risk of failing, since they are no longer 
Self-Employed. However, after a period of time, they once again become Self 
Employed (thus re-enter the risk set) and fail once again (their second 
failure).
 
As a result, multiple failures are possible as individuals are moving in and out 
of different employment states. However, although I understand that Stata can 
recognize multiple failures, I am unsure of how stset can be used to recognize 
the multiple spells of Self-Employment, particularly the period of time between 
spells when the individual is no longer at risk.
 
Specifically, I am unable to set the analysis time back to 0 for when the 
individual begins a second period at risk after being not at risk.
 
For example, one individual in my data set of multiple individuals can look 
like:

   +----------------------------------------------------------------------+
       | ID   Year0   Year   SelfEmploy     Failure         |
       |--------------------------------------------------------------------|
1.    |  1    1989    1990        0                  0            |
2.    |  1    1990    1991        1                  0            |
3.    |  1    1991    1992        1                  0            |
4.     |  1    1992    1993        1                  0            |
5.     |  1    1993    1994        1                  0            |
6.     |  1    1994    1995        0                  1            |
7.     |  1    1995    1996        0                  0            |
8.     |  1    1996    1997        1                  0            |
9.     |  1    1997    1998        1                  0            |
10.   |  1    1998    1999        1                  0            |
11.   |  1    1999    2000        0                  1            |
       +-------------------------------------------------------------------+
 
where “SelfEmploy” is the indicator variable denoting whether or not the 
individual is self employed, “Failed” is an indicator variable denoting if the 

individual has left self employment and year0 and year are the corresponding 
beginning and end of time period.
 
So between, 1990 and 1994, the individual is at risk of failing, and fails 
between 1994 and 1995. But between 1995 and 1996, they are no longer at risk of 

failing (say they are employed in the waged sector). But then they enter self 
employment in 1996 and thus experience another failure between in 1999-2000.
 
Is there a command in stset that allows Stata to “ignore” the periods when they 
are no longer at risk?
 
For example, when I stset my data as follows: stset year, origin(SelfEmploy==1) 
failure(Failed)  time0(Year0)  id(PersonID) exit(time .), the period when they 
are no longer at risk of failing is treated as if they are in self-employment as 
the output I receive is:
 
   
+---------------------------------------------------------------------------------------- +

    | ID   Year0   Year   SelfEmploy   Failure   _s   _d       _t0    _t   |    
    |--------------------------------------------------------------------------------------------|

1. |  1    1989    1990       0               0         0         0        .    
.     |
2. |  1    1990    1991       1              0         0        0       .    
.     |                          
3. |  1    1991    1992       1              0         1        0        0   
 1    |
4. |  1    1992    1993       1              0         1        0        1  
  2    |
5. |  1    1993    1994       1              0         1        0        2   
 3    |
6. |  1    1994    1995       0               1         1        1        3   
 4    |
7. |  1    1995    1996       0               0         1         0        4   
 5    |
8. |  1    1996    1997       1              0         1         0        5   
 6    |
9. |  1    1997    1998       1               0         1         0        6   
 7    |
10.|  1    1998    1999       1              0         1        0        7   
 8    |
11.|  1    1999    2000       0              1         1         1        8   
 9    |
   
+-------------------------------------------------------------------------------------------+

 
Stata seems to count the period form 1995-1996,as a time where the individual is 
at risk of failing, when he is not. 



Therefore,  am unsure as to how to st-set the data so that from 1995-1996, Stata 
recognizes that the individual is no longer at risk of failing and that my 

analysis time can be “Reset” to 0 for when the individual begins a second period 
at risk after being not at risk.
 
Any suggestions?
 
Any help would be appreciated! 
 
Thanks!!
 
Kathleen Bui




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index