# Re: st: event history analysis with years clustered in individuals

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: event history analysis with years clustered in individuals Date Sun, 15 Feb 2009 11:41:09 -0500

```
```
I agree with Austin. Just to be clear: sigma_u is a parameter that is meaningless for this problem, No interpretation is possible.
```

On Feb 15, 2009, at 9:22 AM, Austin Nichols wrote:

```
```Hilde Karlsen <Hilde.Karlsen@hio.no>:
If you have to use a mixed model as an exercise, and you have no
compelling reason to choose a particular research question, you should
ask a different research question where a mixed model is a more
appropriate model, not apply it blindly to data you know is better
suited to a survival model.  Why not use the attrition dummy you have
variables do you have on the data?

```
On Sun, Feb 15, 2009 at 8:26 AM, Hilde Karlsen <Hilde.Karlsen@hio.no> wrote:
```Thank you both for the advice. However, I don't think I can do as you
```
suggest because I have to use a multilevel approach for this essay (it is an essay for a multilevel course I followed a while ago). I should probably have been more clear on this issue, and on what my problem really is. What I am wondering is not which method/command I should use, but how I am going to interprete the sigma_u estimate when my level 1 variable is years and my
```level 2 variable is individuals.

As mentioned, I find it more intuitive to grasp the point of separate
```
variance estimates when the levels are schools, classes etc, but for some
```reason I have a hard time understanding how I should interpreate the
```
variance estimate sigma_u when the years are clustered in individuals. How
```should I interpreate sigma_u when years are clustered in individuals.

```
I asked the professor who was leading the course which command I should use, and he told me I should use xtmelogit (my advicor told me the same thing). As he is the one who is going to judge wheter I pass or not on this essay,
```it is probably best to follow his advice.

```
I agree that it is a survival model, and I have designed my data for this type of analysis (i.e. all individuals in the file start out with 0 on the dependent variable, and when/if they drop out of the nursing occupation, they receive 1 on the dependent variable. I have no info on which date/month people drop out; I only have information on which year they drop out).
```
Regards,
Hilde

```
```
```
Hilde, I agree with Austin's approach. Even if you have only months, not days, of starting and quitting, use that time unit in a survival or discrete survival model. I recommend Stephen Jenkins's -hshaz- (get it from SSC); his "model 1" (the "Prentice-Gloeckler model" is the same as that fit by -cloglog-. His model 2 adds unobserved heterogeneity and so may be more
```realistic (Heckman and Singer, 1984).

```
I would not be surprised if prediction equations for of early and later quitting differed. If so, time-dependent covariates or separate models for
```early and later quitting, would be informative.

-Steve

Prentice, R. and Gloeckler L. (1978). Regression analysis of grouped
```
survival data with application to breast cancer data. Biometrics 34 (1):
```57-67.

```
Heckman, J.J. and Singer, B. (1984). A Method for minimizing the impact of distributional assumptions in econometric models for duration data,
```Econometrica,         52 (2): 271-320.

```
```Hilde Karlsen <Hilde.Karlsen@hio.no>:
Attrition from nursing sounds like a survival model, probably in
discrete time, using -logit- or -cloglog- with time dummies, not
-xtmelogit- (see
```
http://www.iser.essex.ac.uk/iser/teaching/module-ec968 for a textbook and self-guided course on discrete time survival models). If you have T years of data on each individual, all of whom are first-year nurses in period 1, and some of whom quit nursing in each of the subsequent years, with a variable nurse==1 when a nurse (and zero otherwise), an
```individual identifier id, a year variable year, and a bunch of
explanatory variables x*, you can just:

tsset id year
bys id (year): g quit=(l.nurse==1 & nurse==0)
by id: replace quit=. if l.quit==1 | (mi(l.quit)&_n>1)
tab year, gen(_t)
drop _t1
logit quit _t* x*

and then work up to more complicated models with heterogeneous
```
frailty, etc. The main issues are that someone who quit nursing last
```year cannot quit nursing again this year, and people who never quit
nursing might at some future point that you don't observe, which is
why you use survival models...

```
If you know the day they started work and the day they quit, you might
```prefer a continuous-time model (help st).

I've been assuming you had data on people working as nurses, but
```
though I suppose the same considerations apply (though with multiple
```years of data on breastfeeding mothers, there is probably no
censoring).

```
On Fri, Feb 13, 2009 at 9:19 AM, Hilde Karlsen <Hilde.Karlsen@hio.no>
```wrote:
```
```
Dear statalisters,

```
This is probably a stupid question, but I've been searching around the
```nets
```
and in books and articles, and I've still not grasped the concept: When
```I'm
performing a multilevel analysis of attrition from nursing using
xtmelogit,
```
and time (year) is the level 1 variable and individuals (id) is the
```level 2
variable (i.e. years are clustered within individuals; I have a
person-year
```
file), how do I formulate the expectation related to this model? Why is
```it
important to separate between these two levels?

I find it more intuitive to grasp the fact that individuals are
clustered
```
within schools, and that variables on the school level - as well as variables on the individual level - may influence e.g. which grades a
```student gets.

```
I understand (at least I hope I understand) the point that when the same
```individuals are followed over a period of time,  the individual's
responses
```
are probably highly correlated, and that this implies a violation to
```the
```
assumption about the heteroskedastic error-terms. As I see it, I could
```have
```
used the cluster() - command (cluster(id))to 'avoid' this violation; however, I have to write an essay using multilevel analysis, so this is
```not
an option.

```
I don't know if I'm being clear enough about what my problem is, but any information regarding this topic (how to grasp the concept of years
```clustered in individuals) will be greatly appreciated.
```
I'm really sorry for having to ask you such an infantile question.. My colleagues and friends are not familiar with multilevel analyses, so I
```don't
know who to turn to.

Best regards,
Hilde
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```