# Re: st: event history analysis with years clustered in individuals

 From Austin Nichols To statalist@hsphsun2.harvard.edu Subject Re: st: event history analysis with years clustered in individuals Date Sun, 15 Feb 2009 09:22:26 -0500

```Hilde Karlsen <Hilde.Karlsen@hio.no>:
If you have to use a mixed model as an exercise, and you have no
compelling reason to choose a particular research question, you should
ask a different research question where a mixed model is a more
appropriate model, not apply it blindly to data you know is better
suited to a survival model.  Why not use the attrition dummy you have
variables do you have on the data?

On Sun, Feb 15, 2009 at 8:26 AM, Hilde Karlsen <Hilde.Karlsen@hio.no> wrote:
> Thank you both for the advice. However, I don't think I can do as you
> suggest because I have to use a multilevel approach for this essay (it is an
> essay for a multilevel course I followed a while ago). I should probably
> have been more clear on this issue, and on what my problem really is. What I
> am wondering is not which method/command I should use, but how I am going to
> interprete the sigma_u estimate when my level 1 variable is years and my
> level 2 variable is individuals.
>
> As mentioned, I find it more intuitive to grasp the point of separate
> variance estimates when the levels are schools, classes etc, but for some
> reason I have a hard time understanding how I should interpreate the
> variance estimate sigma_u when the years are clustered in individuals. How
> should I interpreate sigma_u when years are clustered in individuals.
>
> I asked the professor who was leading the course which command I should use,
> and he told me I should use xtmelogit (my advicor told me the same thing).
> As he is the one who is going to judge wheter I pass or not on this essay,
>
> I agree that it is a survival model, and I have designed my data for this
> type of analysis (i.e. all individuals in the file start out with 0 on the
> dependent variable, and when/if they drop out of the nursing occupation,
> they receive 1 on the dependent variable. I have no info on which date/month
> people drop out; I only have information on which year they drop out).
>
> Regards,
> Hilde
>
>
>
>>
>> Hilde, I agree with Austin's approach. Even if you have only months, not
>> days, of starting and quitting, use that time unit in a survival or discrete
>> survival model.  I recommend  Stephen Jenkins's -hshaz- (get it from SSC);
>> his "model 1" (the "Prentice-Gloeckler model" is the same as that fit by
>> -cloglog-. His model 2 adds unobserved heterogeneity and so may be more
>> realistic (Heckman and Singer, 1984).
>>
>> I would not be surprised if prediction equations for of early and later
>> quitting differed. If so, time-dependent covariates or separate models for
>> early and later quitting, would be informative.
>>
>> -Steve
>>
>> Prentice, R. and Gloeckler L. (1978). Regression analysis of grouped
>> survival data with application to breast cancer data.  Biometrics 34 (1):
>> 57-67.
>>
>> Heckman, J.J. and Singer, B. (1984). A Method for minimizing the impact of
>>         distributional assumptions in econometric models for duration data,
>> Econometrica,         52 (2): 271-320.
>>
>>
>>
>>> Hilde Karlsen <Hilde.Karlsen@hio.no>:
>>> Attrition from nursing sounds like a survival model, probably in
>>> discrete time, using -logit- or -cloglog- with time dummies, not
>>> -xtmelogit- (see
>>> http://www.iser.essex.ac.uk/iser/teaching/module-ec968 for a textbook
>>> and self-guided course on discrete time survival models).  If you have
>>> T years of data on each individual, all of whom are first-year nurses
>>> in period 1, and some of whom quit nursing in each of the subsequent
>>> years, with a variable nurse==1 when a nurse (and zero otherwise), an
>>> individual identifier id, a year variable year, and a bunch of
>>> explanatory variables x*, you can just:
>>>
>>> tsset id year
>>> bys id (year): g quit=(l.nurse==1 & nurse==0)
>>> by id: replace quit=. if l.quit==1 | (mi(l.quit)&_n>1)
>>> tab year, gen(_t)
>>> drop _t1
>>> logit quit _t* x*
>>>
>>> and then work up to more complicated models with heterogeneous
>>> frailty, etc. The main issues are that someone who quit nursing last
>>> year cannot quit nursing again this year, and people who never quit
>>> nursing might at some future point that you don't observe, which is
>>> why you use survival models...
>>>
>>> If you know the day they started work and the day they quit, you might
>>> prefer a continuous-time model (help st).
>>>
>>> I've been assuming you had data on people working as nurses, but
>>> though I suppose the same considerations apply (though with multiple
>>> years of data on breastfeeding mothers, there is probably no
>>> censoring).
>>>
>>> On Fri, Feb 13, 2009 at 9:19 AM, Hilde Karlsen <Hilde.Karlsen@hio.no>
>>> wrote:
>>>>
>>>> Dear statalisters,
>>>>
>>>> This is probably a stupid question, but I've been searching around the
>>>> nets
>>>> and in books and articles, and I've still not grasped the concept: When
>>>> I'm
>>>> performing a multilevel analysis of attrition from nursing using
>>>> xtmelogit,
>>>> and time (year) is the level 1 variable and individuals (id) is the
>>>> level 2
>>>> variable (i.e. years are clustered within individuals; I have a
>>>> person-year
>>>> file), how do I formulate the expectation related to this model? Why is
>>>> it
>>>> important to separate between these two levels?
>>>>
>>>> I find it more intuitive to grasp the fact that individuals are
>>>> clustered
>>>> within schools, and that variables on the school level - as well as
>>>> variables on the individual level - may influence e.g. which grades a
>>>> student gets.
>>>>
>>>> I understand (at least I hope I understand) the point that when the same
>>>> individuals are followed over a period of time,  the individual's
>>>> responses
>>>> are probably highly correlated, and that this  implies a violation to
>>>> the
>>>> assumption about the heteroskedastic error-terms. As I see it, I could
>>>> have
>>>> used the cluster() - command (cluster(id))to 'avoid' this violation;
>>>> however, I have to write an essay using multilevel analysis, so this is
>>>> not
>>>> an option.
>>>>
>>>> I don't know if I'm being clear enough about what my problem is, but any
>>>> information regarding this topic (how to grasp the concept of years
>>>> clustered in individuals) will be greatly appreciated.
>>>> I'm really sorry for having to ask you such an infantile question.. My
>>>> colleagues and friends are not familiar with multilevel analyses, so I
>>>> don't
>>>> know who to turn to.
>>>>
>>>> Best regards,
>>>> Hilde
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```