Regression analysis with episode length as
outcome, when only incomplete episode durations are available.
| Speakers |
Mohamed Ali, Tom Marshall, London School of Hygiene and Tropical Medicine, and Abdel Babiker, MRC HIV Clinical Trials Centre |
The subject of this talk is regression with duration of episodes as outcome,
when the data are cross-sectional and give information only about episodes
still in progress. Such data will typically record the time of start of the
current episode and the values of the various covariates to be taken as
independent variables in the regression analysis. We use survey data on
duration of current contraceptive use as an illustration. The interest lies
in the relationships of the covariates with the overall length of the
episodes (overall duration of use), rather than the truncated period indicated
by the duration of the current episode.
It is necessary to assume a parametric form for the (unobserved) distribution
of completed episode lengths. An expression for the distribution of the
truncated episode lengths can then be derived and used in regression
modelling. The parameters of the incomplete length distribution can be
matched to those of the complete distribution and thus relationships with
completed episode length inferred. We illustrate this procedure using the
Weibull and log-logistic distributions for the completed episode lengths,
demonstrating the fitting of regression models by maximum likelihood using
do files. These models can also be fitted by quasi-likelihood using
the glm or xtgee procedures, the latter giving the possibility
of allowing for clustering if the cross-sectional data come from a survey.
We show simulation results to confirm the validity of this approach, and also
demonstrate the use of further do files that provide a visual check on
the distributional assumptions by means of cumulative plots.
|
|