Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Artificial censoring in survival analysis

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: Artificial censoring in survival analysis
Date	Sun, 7 Aug 2011 15:52:00 -0400

Melaku sent me the following privately.

My answers:

q1. Starting college graduates at graduation is OK.  But college graduation should be a covariate then.
q2. Usually more data is better.  Now you are asking about -pgmhaz- not -hshaz-.  Both are based on -ml- as the -help- for -hshaz- makes clear, so you have some other options to help the maximization, such as "difficult".

I don't have much experience with these, but the usual idea is to start with simpler (non-heterogeneity) models and work up.


Steve

Thanks Steve. I will include all observations as you suggested.

Few more questions: 

1: do you think the way I treat college graduates - that I analyze them from the moment of their college graduation and not before that, of course on condition that they were unemployed when completing college - is reasonable?

2. If I use all those months on 12 years span (144 months) I will have a huge dataset. Even with fewer months -pgmhaz- has difficulty converging. Do you think I need to study a small sample of my data?

Thanks,
Melaku


On 6 Aug 2011, at 15:48, Steven Samuels <[email protected]> wrote:

> 
> 
> Melaku, here is a copy of what I sent to Statalist.
> 
> --
> 
> Thanks for your detailed answer.  This looks like an excellent data set.  I suggest that you use all the observations that you have and censor everyone the first year they leave unemployment or 2006 for those who never became employed.  There is no advantage that I can see in ending observation after 24 months for everybody. On the contrary, there is a loss of information. 
> 
> 
> Steve
> 
> 
> On Aug 5, 2011, at 3:29 PM, Melaku Fekadu wrote:
> 
> Steven,
> 
> 
> Thanks for taking your time and answering me. Here are some detail
> about my data.
> 
> 
> What kind of study generated the data. A prospective cohort?.  A
> cross-section with retrospective recall?
> 
> 
> It is a panel administrative data.
> 
> 
> • Was the study a complex sample, so that there are weights and
> clusters (PSUs)?
> 
> 
> It has no weight.
> 
> 
> • What is the purpose of YOUR analysis?
> 
> 
> I am analyzing determinants of reemployment of unemployed individuals.
> 
> 
> • What was the larger data set, if any, from which you took your
> specific data.  What criteria did you use for inclusions?
> 
> 
> I have data on one age cohort - those who were born in 1977 and were
> unemployed in 1995 (at age 18). I have their monthly employment data
> from 1995 to 2006, that is from age 18 to 29. Some were studying in
> college during these years and some were not. For those who did not go
> to college, I use for employment data from January 1995 for the
> analysis. But for those who went to college I use their employment
> data from the month of their college completion. The date of college
> completion could be different for different individuals. This makes
> the date of entry to analysis different for all individuals. This also
> makes different the length of observation period for all individuals;
> some have longer period of observation and some have less. Remember
> that my data is restricted to 1995 to 2006. To overcome this problem I
> decided to censor all at a given length of observation, say 24 months.
> Because those who went to college are "too young" to experience entry
> to employment compared to those who went to look for work directly.
> 
> 
> • What is month "1"?   a calendar month, a month of an interview?  The
> first month of unemployment?
> 
> 
> Month "1" is the first month of unemployment. Month "1" could be
> different for each observation.
> 
> 
> • Did unemployment start before month "1" for everybody or some
> people?  After month 1?
> 
> 
> Month "1" is the start of unemployment for all. Some have just
> finished school, some have just departed from an earlier job and start
> looking for work starting from month "1".
> 
> 
> • For those who started before month "1", do you know how long they
> had been unemployed?
> 
> 
> No unemployment before month "1".
> 
> On Fri, Aug 5, 2011 at 12:22 AM, Steven Samuels <[email protected]> wrote:
>> 
>> -
>> I am answering your second question about -hshaz-.  There are examples of two and three mass points at the end of the -help-.  The mixture model for heterogeneity means that the unobserved log hazard is at one of those points, with locations and probabilities to be estimated.
>> 
>> 
>> 
>> For your earlier question.
>> 
>> I don't see a good reason for censoring individuals at 12 months because of problems in observing other individuals.  However until you describe your data more fully, then I really don't know.
>> 
>> 
>> • What kind of study generated the data. A prospective cohort?.  A cross-section with retrospective recall?
>> 
>> • Was the study a complex sample, so that there are weights and clusters (PSUs)?
>> 
>> • What is the purpose of YOUR analysis?
>> 
>> • What was the larger data set, if any, from which you took your specific data.  What criteria did you use for inclusions?
>> 
>> • What is month "1"?   a calendar month, a month of an interview?  The first month of unemployment?
>> 
>> • Did unemployment start before month "1" for everybody or some people?  After month 1?
>> 
>> • For those who started before month "1", do you know how long they had been unemployed?
>> 
>> • What do you mean people were "younger" to experience the event?  Did you mean "too young" to qualify as unemployed at the start?
>> 
>> • Why do you have information on some people for more than 12 months but not for others?  How did observation end.
>> 
>> • Have you information on people who were employed but became unemployed during the study period (perhaps not in the data set you describe below.
>> 
>> 
>> In short we need a complete description of the study design and the beginning and endinfg of observation.
>> 
>> 
>> 
>> Dear statalisters,
>> 
>> I am doing a project on duration of unemployment. I want to compare models with and without unobserved heterogeneity. I want to use -hshaz- module to estimate a mixture model but I couldn't find example on how to do that. I will appreciate any help where to find examples.
>> 
>> Thanks,
>> Melaku
>> 
>> 
>> On Aug 2, 2011, at 3:25 AM, [email protected] wrote:
>> 
>> Hello statalisters,
>> 
>> I analyze employment data using survival method for a length of 12 months. I decided to do so because some of my observations are younger to experience the event (in this case exiting unemployment) for more than 12 months; that is I observe them only for 12 months. To overcome this problem I imposed a 12 months period of analysis for all of my observations. That is all observations have equal length of 12 months to experience the event. I did so by artificially censoring those observations for whom I have data for more than 12 months and did not experience the event within 12 months. These are old individuals. I did censor even though I see some of these observations experience the event later, after the 12 months period.
>> 
>> My questions:
>> 1. Should I include in the analysis those observations that I censored?
>> 2. Is the sample data presented below appropriate for survival analysis? Note that all of observations experience the event except those I censored at the 12 month.
>> 
>> Below is a small representation of my data. The failure variable 'Failure' is cross-tabulated with the variable 'studytime' which is the number of months until experiencing the event.
>> 
>> Failure
>> 0 | 1
>> ------
>> 1    0 | 200
>> 2    0 | 89
>> 3    0 | 70
>> 5    0 | 68
>> 6    0 | 58
>> 7    0 | 50
>> 8    0 | 51
>> 10   0 | 45
>> 11   0 | 30
>> 12   150 | 0
>> 
>> Thanks,
>> Melaku
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Artificial censoring in survival analysis
  - From: [email protected]
- Re: st: Artificial censoring in survival analysis
  - From: Steven Samuels <[email protected]>
- Re: st: Artificial censoring in survival analysis
  - From: Melaku Fekadu <[email protected]>

Prev by Date: st: Plotting regression constants
Next by Date: Re: st: Plotting regression constants
Previous by thread: Re: st: Artificial censoring in survival analysis
Next by thread: st: Intensity/heat maps with Stata
Index(es):
- Date
- Thread