Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Bootstrap to compare ROC area on imputed dataset

From   Cameron McIntosh <>
Subject   RE: st: Bootstrap to compare ROC area on imputed dataset
Date   Thu, 17 Nov 2011 15:36:22 -0500


You're asking for both specific Stata code and more general methodological guidance. I can try to take a bit of a crack at the latter. Bootstrapping in conjunction with imputation is quite intensive, although it can of course be done (after all, the two are similar in a number of ways):

Efron, B. (1994). Missing Data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475.

Heymans, M.W., van Buuren, S., Knol, D.K., van Mechelen, M., & de Vet, H.C.W. (2007). Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Medical Research Methodology, 7:33.

Kim, J.K., Brick, J.M., Fuller, W.A., & Kalton, G. (2006). On the bias of the multiple-imputation variance estimator in survey sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 509–521.

Kim, J.K., & Rao, J.N.K. (2009). A unified approach to linearization variance estimation from survey data after imputation for item nonresponse. Biometrika, 96(4), 917-932. 

Davison, A.C., & Sardy, S. (2007). Resampling Variance Estimation in Surveys with Missing Data. Journal of Official Statistics, 23(3), 371–386.

Eltinge, J.L. (1996). On Variance Estimation With Imputed Survey Data: Comment. Journal of the American Statistical Association, 91(434), 513-515.

Efron, B. (1994). Missing Data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475.

Saigo, H., Shao, J., & Sitter, R.R. (2001). A Repeated Half­Sample Bootstrap and Balanced Repeated Replications for Randomly Imputed Data. Survey Methodology, 27(2), 189-­196.

Shao, J., & Sitter, R.R. (1996). Bootstrap for imputed survey data. Journal of the American Statistical Association, 91, 1278­1288.

Chen, J., Rao, J.N.K., & Sitter, R.R. (2000). Adjusted imputation for missing data in complex surveys. Statistics Sinica, 10, 1153­1169. 

Essentially, what you want is D=(AUC1-AUC2)/SE_b,

where AUC1 and AUC2 are the original AUCs from the two models being compared, and SE_b is the standard error of the bootstrapped AUC differences. I don't imagine this would be very hard to program, perhaps in R if not in Stata.  I think you would just bootstrap from each imputed data set so this would expand the number of replications as follows: k imputations * b bootstrap samples.  You also definitely need to see (and you might want to try the empirical likelihood approach):

Long, Q., Zhang, X., & Hsu, C.-H. (2011). Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random. Statistics in Medicine, Early View.;jsessionid=63E100FD9A64CCB7B6C8E6D57CA08581.d01t02

Liu, D., & Zhou, X.-H. (January 21, 2011). Semiparametric Estimation of the Covariate-Specific ROC Curve in Presence of Ignorable Verification Bias. UW Biostatistics Working Paper Series. Working Paper 374. Seattle, WA: University of Washington - Seattle Campus.

An, Y. (2011). Empirical Likelihood Confidence Intervals for ROC Curves with Missing Data. Mathematics Theses. Paper 95.

Liu, X. (2010). Semi-Empirical Likelihood Confidence Intervals for the ROC Curve with Missing Data. Mathematics Theses. Paper 89.

Janssen, K.J.M., Vergouwe, Y., Donders, A.R.T., Harrell, F.E., Jr., Chen, Q., Grobbee, D.E., & Moons, K.G.M. (2009). Dealing with Missing Predictor Values When Applying Clinical Prediction Models.Clinical Chemistry, 55, 994-1001.

Liu, D., & Zhou, X.-H. (2010). A model for adjusting for nonignorable verification bias in estimation of the ROC curve and its area with likelihood-based approach. Biometrics, 66(4), 1119-1128.

Hope this helps,


>Date: Thu, 17 Nov 2011 18:04:56 +0100> Subject: st: Bootstrap to compare ROC area on imputed dataset> From:
> To:
> We are analysing discriminating capacity of a clinical score. Because
> of some missing values we had to use imputed dataset. We have now
> constructed a new clinical score and want to compare the new with an
> old, using bootstrap.
> We have used mim, category(combine) est(r(area)) se(r(se)) : roctab
> diagnosis score1, summary to analyse the combined ROC area of the
> imputed datasets. However we want to compare two different models and
> would normally use roctab for this, but this does not work with mim,
> category(combine).
> We also want to make a bootstrapped analysis of the diagnostic
> properties of a new clinical score on the imputed dataset.
> We would appreciate any help on how to do the bootstrapping of the ROC
> areas and comparing two areas on the imputed dataset.
> Regards
> Roland Andersson
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index