Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
Date	Wed, 23 Mar 2011 00:35:32 -0400
Kathleen, 

You do lose the ability to set the cluster() option, but in non-survey -stcox-, the cluster() option changes only standard errors, not coefficients.  With survey data, standard errors are governed  by the survey design, so the cluster() option is irrelevant.  You still need the multiple-record data structure for multiple-failure models, but the  information and history for a single person must be contained in the covariates. 

When you do the survey version of -stcox-, you also lose the ability to estimate frailty models (shared() option), and this option  *can* change coefficients.  However if your data set contained jackknife or bootstrap replicate weights, as yours, apparently, does not, you could estimate even these models,

Note that if analysis time "0" is the first year in self-employed status, then you can only analyze failures in the second and subsequent years.

Steve

Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:   206-202-4783 
[email protected]



Steve
On Mar 22, 2011, at 11:01 PM, Kathleen Bui wrote:

Thank you Steve and Nick,
 
 
Yes, I will mention the large bias and measurement error present and I have 
st-set the data so that the first year an indiviudal states he was in 

self-employmetn was recorded as starting at  "analysis time" 0. 

But I am running into a problem with svy-setting my data. 

I was going to proceed with the method in 3.2.4 of 
http://www.stata.com/support/faqs/stat/stmfail.html

For my survey, I have both strata, cluster and weights, so I svy-set my data 
accordingly: svyset PSU [pw=Weight], stra(strata)

However,  as seen in 3.2.4, I again need to cluster on on my Person ID variable 
since, with the multiple failures and resetting my time to zero, I have made it 
seem as though each spell of self-employment was essentially from a different 
indiviudal, (when in reality, it is not)  

However, I am unable to use the cluster option with the svy option. 

I am not sure how to solve this issue. Any suggestions?
 
Thank you for all the help!
 
 
Kathleen--




----- Original Message ----
From: Steven Samuels <[email protected]>
To: [email protected]
Sent: Sat, March 19, 2011 5:02:47 PM
Subject: Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in 
and out of risk set

Kathleen--

With your data, you are obligated to report that measurement error of *at least* 
±1 years is possible in recorded "times" of employment  because dates that 
self-employment started or stopped in a year are unknown.  Also, report that 
there is a positive bias in estimates of probabilities that a person stayed 
self-employed for at least k years. The bias arises because the data don't 
record instances where people left and returned to self-employment between 
interviews. So, for example, four consecutive "years" (i.e. interviews) of 
reported self-employment could be made up of a number of shorter spells.

Status at interview apparently was the only observation actually made, so I 
suggest that you model that status directly instead of a questionable time 
variable. Such an analysis would be based on the same data as you'd feed into 
-stset-.  Model the probability that if a person was self-employed at the year K 
interview, they were also self-employed  at the year K+1 interview.  In this 
analysis  the zero is the first interview in a spell of self-empployment, and 
you index all the subsequent interviews as Nick suggested.  


If your data are based on a complex survey sample, -svyset- your data and use 
-svy: logistic_.  Failure to do so would invalidate your standard errors and 
hypothesis tests.

Steve

Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:  206-202-4783 
[email protected]




On Mar 19, 2011, at 5:46 AM, Nick Cox wrote:

I don't understand what you are trying to do, but given a
classification of spells by a variable -_spell- then time in each
spell has a minimum

egen Start = min(Year) if _spell, by(PersonId _spell)

so that you just need to subtract that from Year to get a time
variable that starts at 0 in each spell.

Another way to do it is

bysort PersonId _spell (Year) : gen Time = Year - Year[1] if _spell

Nick

On Sat, Mar 19, 2011 at 12:13 AM, Kathleen Bui <[email protected]> wrote:

> Thanks for all the help!
> 
>   I do understand that smaller time intervals would be a much better , but I
> don't have access to any smaller time frame than a year.
> 
> On another note,I was wondering, how do I go about "reseting" the time to zero
> for each spell of self-employment, since I have multiple observations for each
> spell of selfemployment? (If I wanted to employ the PWP time gap model 
> approach)
> 
> 
> 
> For example, following my example before, if I had something that looked like:
> 
> (where the _spell, just indicates what spell of self-employment (first second
> etc)),
> 
> 
> How can I stset the data so the time is "reset" to zero for each new spell?
> 
> 
> +----------------------------------------------------------------------------------+
> +
> 
> 
> PersonID  Year0  Year  Failed  SelfEmploy  _spell
> -------------------------------------------------------------------------------------------
> -
> 
> 
> 1.        1      .      1990        0          0        0
> 2.        1    1990  1991        0          1        1
> 3.        1    1991  1992        0          1        1
> 4.        1    1992  1993        0          1        1
> 5.        1    1993  1994        1          1        1
> 6.        1    1994  1995        0          0        0
> 7.        1    1995  1996        0          0        0
> 8.        1    1996  1997        0          1        2
> 9.        1    1997  1998        0          1        2
> 10.      1    1998  1999        1          1        2
> -------------------------------------------------------
> 11.        1    1999  2000        0          0        0
> 12.        2      .      1993        0          0        0
> 13.        2    1993  1994        0          1        1
> 14.        2    1994  1995        0          1        1
> 15.        2    1995  1996        0          1        1
> -------------------------------------------------------
> 16.        2    1996  1997        1          1        1
> 17.        2    1997  1998        0          0        0
> +-------------------------------------------------------+
> 
> If I do:
> 
> stset Year, origin(SelfEmploy==1) failure(Failed) time0(Year0) id(PersonID)
> exit(time .) if(_spell!=0)
> 
> this doesn't reset the time for the beginning of each spell, rather it 
> continues
> (with time gaps) from the time of the first spell.
> 
> Thanks again! Appreciate the help!
> -Kathleen
> 
> 
> The following example (performed in Stata 9.2/SE) considers this issue:
> --------------- exampe begins ------------------------------------
> set obs 6
> g id = 1 in 1/2
> replace id=2 in 3/4
> replace id=3 in 5/6
> g In=0
> replace In=6 in 2
> replace In=3 in 4
> replace In=4 in 6
> g Out=1
> replace Out=7 in 2
> replace Out=8 in 4
> replace Out=5 in 6
> g No_Self_Employed=1
> replace No_Self_Employed=0 in 4
> stset Out, id(id) failure(No_Self_Employed==1)time0(In)
> exit(No_Self_Employed==2) origin(time In)
> stdes
> --------------- exampe ends ------------------------------------
> 
> In the previous code subjects do not live the SA at the first failure (ie
> No_Self_Employed==1)- since it would conflate with the assumption of
> multiple failures - but when the event No_Self_Employed==2 comes alive (and
> this event will never occurr).
> 
> As I can see from your thread and previous replies, your subjects do show
> gaps. You can check whether gaps are consistent with your methodological
> expectations using - stdes -.
> 
> For more on this topic, I would refer you to:
> MA Cleves, WW Gould, RG Gutierrez. An intoduction to survival analysis using
> Stata. Revised edition. College Station: Stata Press, 2004: 59-62.The same
> textbook (147-156)also offers interesting insights on Cox model with shared
> frailty, that may fit your data;
> the already referenced http://www.stata.com/support/faqs/stat/stmfail.html.
> 
> HTH and Kind Regards,
> Carlo
> -----Messaggio originale-----
> Da: [email protected]
> [mailto:[email protected]] Per conto di Kathleen Bui
> Inviato: domenica 13 marzo 2011 16.31
> A: [email protected]
> Oggetto: st: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and
> out of risk set
> 
> My question is how to stset a multiple failure data set when an individual
> can
> move in and out of the risk set.
> 
> I have read Cleves’s An Introduction to Survival Analysis Using Stata,
> Cleve’s
> STB-49, and all previous posts concerning st-setting multiple failures.
> Others
> have asked similar questions as mine, but I have yet to find a solution that
> 
> works.
> 
> I am analyzing the duration of an individual’s stay in Self-Employment.
> Failure
> will be exit from self-employment.  My question is how can I stset the data
> so
> that Stata recognizes that an individual can move into and out of the risk
> set
> (which is being Self-Employed).
> 
> To be more explicit, for each individual in my data set, I have information
> as
> to whether or not they are Self-Employed.  The issue arises when an
> individual
> has a self employment history as follows:
> 
> The individual is self-employed and therefore at risk of failure.  Then they
> 
> fail (leave self employment) and enter waged employment. By entering waged
> employment, they are no longer at risk of failing, since they are no longer
> Self-Employed. However, after a period of time, they once again become Self
> Employed (thus re-enter the risk set) and fail once again (their second
> failure).
> 
> As a result, multiple failures are possible as individuals are moving in and
> out
> of different employment states. However, although I understand that Stata
> can
> recognize multiple failures, I am unsure of how stset can be used to
> recognize
> the multiple spells of Self-Employment, particularly the period of time
> between
> spells when the individual is no longer at risk.
> 
> Specifically, I am unable to set the analysis time back to 0 for when the
> individual begins a second period at risk after being not at risk.
> 
> For example, one individual in my data set of multiple individuals can look
> like:
> 
>     +----------------------------------------------------------------------+
>         | ID  Year0  Year  SelfEmploy    Failure        |
> 
> |--------------------------------------------------------------------|
> 1.    |  1    1989    1990        0                  0            |
> 2.    |  1    1990    1991        1                  0            |
> 3.    |  1    1991    1992        1                  0            |
> 4.    |  1    1992    1993        1                  0            |
> 5.    |  1    1993    1994        1                  0            |
> 6.    |  1    1994    1995        0                  1            |
> 7.    |  1    1995    1996        0                  0            |
> 8.    |  1    1996    1997        1                  0            |
> 9.    |  1    1997    1998        1                  0            |
> 10.  |  1    1998    1999        1                  0            |
> 11.  |  1    1999    2000        0                  1            |
>         +-------------------------------------------------------------------+
> 
> where “SelfEmploy” is the indicator variable denoting whether or not the
> individual is self employed, “Failed” is an indicator variable denoting if
> the
> 
> individual has left self employment and year0 and year are the corresponding
> 
> beginning and end of time period.
> 
> So between, 1990 and 1994, the individual is at risk of failing, and fails
> between 1994 and 1995. But between 1995 and 1996, they are no longer at risk
> of
> 
> failing (say they are employed in the waged sector). But then they enter
> self
> employment in 1996 and thus experience another failure between in 1999-2000.
> 
> Is there a command in stset that allows Stata to “ignore” the periods when
> they
> are no longer at risk?
> 
> For example, when I stset my data as follows: stset year,
> origin(SelfEmploy==1)
> failure(Failed)  time0(Year0)  id(PersonID) exit(time .), the period when
> they
> are no longer at risk of failing is treated as if they are in
> self-employment as
> the output I receive is:
> 
> 
> +---------------------------------------------------------------------------
> ------------- +
> 
>     | ID  Year0  Year  SelfEmploy  Failure  _s  _d      _t0    _t
> |
>     |-----------------------------------------------------------------------
> ---------------------|
> 
> 1. |  1    1989    1990      0              0        0        0
> .
> .    |
> 2. |  1    1990    1991      1              0        0        0      .
> 
> .    |
> 3. |  1    1991    1992      1              0        1        0        0
> 
>   1    |
> 4. |  1    1992    1993      1              0        1        0        1
>   2    |
> 5. |  1    1993    1994      1              0        1        0        2
> 
>   3    |
> 6. |  1    1994    1995      0              1        1        1
> 3
>   4    |
> 7. |  1    1995    1996      0              0        1        0
> 4
>   5    |
> 8. |  1    1996    1997      1              0        1        0
> 5
>   6    |
> 9. |  1    1997    1998      1              0        1        0
> 6
>   7    |
> 10.|  1    1998    1999      1              0        1        0        7
> 
>   8    |
> 11.|  1    1999    2000      0              1        1        1
> 8
>   9    |
> 
> +---------------------------------------------------------------------------
> ----------------+
> 
> 
> Stata seems to count the period form 1995-1996,as a time where the
> individual is
> at risk of failing, when he is not.
> 
> 
> 
> Therefore,  am unsure as to how to st-set the data so that from 1995-1996,
> Stata
> recognizes that the individual is no longer at risk of failing and that my
> 
> analysis time can be “Reset” to 0 for when the individual begins a second
> period
> at risk after being not at risk.

*
*  For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/statalist/faq
*  http://www.ats.ucla.edu/stat/stata/


*
*  For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/statalist/faq
*  http://www.ats.ucla.edu/stat/stata/





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
  - From: Kathleen Bui <[email protected]>
Prev by Date: Re: st: multiple regression, r squared and normality of residuals
Next by Date: st: R: Correlation of repeated baseline measures in sampsi
Previous by thread: Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
Index(es):
- Date
- Thread