Regression analysis with episode length as
outcome, when only incomplete episode durations are available.
| Speakers |
Mohamed Ali, Tom Marshall, London School of Hygiene and Tropical Medicine, and Abdel Babiker, MRC HIV Clinical Trials Centre |
The subject of this talk is regression with duration of episodes as outcome,
when the data are cross-sectional and give information only about episodes
still in progress. Such data will typically record the time of start of the
current episode and the values of the various covariates to be taken as
independent variables in the regression analysis. We use survey data on
duration of current contraceptive use as an illustration. The interest lies
in the relationships of the covariates with the overall length of the
episodes (overall duration of use), rather than the truncated period
indicated by the duration of the current episode.
It is necessary to assume a parametric form for the (unobserved)
distribution of completed episode lengths. An expression for the
distribution of the truncated episode lengths can then be derived and used
in regression modelling. The parameters of the incomplete length
distribution can be matched to those of the complete distribution and thus
relationships with completed episode length inferred. We illustrate this
procedure using the Weibull and log-logistic distributions for the completed
episode lengths, demonstrating the fitting of regression models by maximum
likelihood using do files. These models can also be fitted by
quasi-likelihood using the glm or xtgee procedures, the latter
giving the possibility of allowing for clustering if the cross-sectional
data come from a survey. We show simulation results to confirm the validity
of this approach, and also demonstrate the use of further do files
that provide a visual check on the distributional assumptions by means of
cumulative plots.
|
Meetings
Stata Conference
User Group meetings
Proceedings
|