Regression analysis with episode length as outcome, when only incomplete episode durations are available.

Speakers:  Mohamed Ali, Tom Marshall, London School of Hygiene and Tropical Medicine, and Abdel Babiker, MRC HIV Clinical Trials Centre

The subject of this talk is regression with duration of episodes as outcome, when the data are cross-sectional and give information only about episodes still in progress. Such data will typically record the time of start of the current episode and the values of the various covariates to be taken as independent variables in the regression analysis. We use survey data on duration of current contraceptive use as an illustration. The interest lies in the relationships of the covariates with the overall length of the episodes (overall duration of use), rather than the truncated period indicated by the duration of the current episode.

It is necessary to assume a parametric form for the (unobserved) distribution of completed episode lengths. An expression for the distribution of the truncated episode lengths can then be derived and used in regression modelling. The parameters of the incomplete length distribution can be matched to those of the complete distribution and thus relationships with completed episode length inferred. We illustrate this procedure using the Weibull and log-logistic distributions for the completed episode lengths, demonstrating the fitting of regression models by maximum likelihood using do files. These models can also be fitted by quasi-likelihood using the glm or xtgee procedures, the latter giving the possibility of allowing for clustering if the cross-sectional data come from a survey. We show simulation results to confirm the validity of this approach, and also demonstrate the use of further do files that provide a visual check on the distributional assumptions by means of cumulative plots.