Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Bootstrap to compare ROC area on imputed dataset

From	Cameron McIntosh <[email protected]>
To	STATA LIST <[email protected]>
Subject	RE: st: Bootstrap to compare ROC area on imputed dataset
Date	Thu, 17 Nov 2011 15:36:22 -0500

Roland,

You're asking for both specific Stata code and more general methodological guidance. I can try to take a bit of a crack at the latter. Bootstrapping in conjunction with imputation is quite intensive, although it can of course be done (after all, the two are similar in a number of ways):

Efron, B. (1994). Missing Data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475.

Heymans, M.W., van Buuren, S., Knol, D.K., van Mechelen, M., & de Vet, H.C.W. (2007). Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Medical Research Methodology, 7:33.http://www.biomedcentral.com/content/pdf/1471-2288-7-33.pdf

Kim, J.K., Brick, J.M., Fuller, W.A., & Kalton, G. (2006). On the bias of the multiple-imputation variance estimator in survey sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 509–521.

Kim, J.K., & Rao, J.N.K. (2009). A unified approach to linearization variance estimation from survey data after imputation for item nonresponse. Biometrika, 96(4), 917-932. 

Davison, A.C., & Sardy, S. (2007). Resampling Variance Estimation in Surveys with Missing Data. Journal of Official Statistics, 23(3), 371–386.

Eltinge, J.L. (1996). On Variance Estimation With Imputed Survey Data: Comment. Journal of the American Statistical Association, 91(434), 513-515.

Efron, B. (1994). Missing Data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475.

Saigo, H., Shao, J., & Sitter, R.R. (2001). A Repeated HalfSample Bootstrap and Balanced Repeated Replications for Randomly Imputed Data. Survey Methodology, 27(2), 189-196.http://www.statcan.gc.ca/ads-annonces/12-001-x/6095-eng.pdf

Shao, J., & Sitter, R.R. (1996). Bootstrap for imputed survey data. Journal of the American Statistical Association, 91, 12781288.

Chen, J., Rao, J.N.K., & Sitter, R.R. (2000). Adjusted imputation for missing data in complex surveys. Statistics Sinica, 10, 11531169. 

Essentially, what you want is D=(AUC1-AUC2)/SE_b,

where AUC1 and AUC2 are the original AUCs from the two models being compared, and SE_b is the standard error of the bootstrapped AUC differences. I don't imagine this would be very hard to program, perhaps in R if not in Stata.  I think you would just bootstrap from each imputed data set so this would expand the number of replications as follows: k imputations * b bootstrap samples.  You also definitely need to see (and you might want to try the empirical likelihood approach):

Long, Q., Zhang, X., & Hsu, C.-H. (2011). Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random. Statistics in Medicine, Early View.http://onlinelibrary.wiley.com/doi/10.1002/sim.4338/abstract;jsessionid=63E100FD9A64CCB7B6C8E6D57CA08581.d01t02

Liu, D., & Zhou, X.-H. (January 21, 2011). Semiparametric Estimation of the Covariate-Specific ROC Curve in Presence of Ignorable Verification Bias. UW Biostatistics Working Paper Series. Working Paper 374. Seattle, WA: University of Washington - Seattle Campus.http://www.bepress.com/cgi/viewcontent.cgi?article=1213&context=uwbiostat

An, Y. (2011). Empirical Likelihood Confidence Intervals for ROC Curves with Missing Data. Mathematics Theses. Paper 95.http://digitalarchive.gsu.edu/math_theses/95

Liu, X. (2010). Semi-Empirical Likelihood Confidence Intervals for the ROC Curve with Missing Data. Mathematics Theses. Paper 89.http://digitalarchive.gsu.edu/math_theses/89

Janssen, K.J.M., Vergouwe, Y., Donders, A.R.T., Harrell, F.E., Jr., Chen, Q., Grobbee, D.E., & Moons, K.G.M. (2009). Dealing with Missing Predictor Values When Applying Clinical Prediction Models.Clinical Chemistry, 55, 994-1001.http://www.clinchem.org/cgi/reprint/55/5/994http://www.clinchem.org/cgi/content/full/clinchem.2008.115345/DC1

Liu, D., & Zhou, X.-H. (2010). A model for adjusting for nonignorable verification bias in estimation of the ROC curve and its area with likelihood-based approach. Biometrics, 66(4), 1119-1128.

Hope this helps,

Cam

>Date: Thu, 17 Nov 2011 18:04:56 +0100> Subject: st: Bootstrap to compare ROC area on imputed dataset> From: [email protected]
> To: [email protected]
> 
> We are analysing discriminating capacity of a clinical score. Because
> of some missing values we had to use imputed dataset. We have now
> constructed a new clinical score and want to compare the new with an
> old, using bootstrap.
> 
> We have used mim, category(combine) est(r(area)) se(r(se)) : roctab
> diagnosis score1, summary to analyse the combined ROC area of the
> imputed datasets. However we want to compare two different models and
> would normally use roctab for this, but this does not work with mim,
> category(combine).
> 
> We also want to make a bootstrapped analysis of the diagnostic
> properties of a new clinical score on the imputed dataset.
> 
> We would appreciate any help on how to do the bootstrapping of the ROC
> areas and comparing two areas on the imputed dataset.
> 
> Regards
> 
> Roland Andersson
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Bootstrap to compare ROC area on imputed dataset
  - From: roland andersson <[email protected]>

References:
- st: Bootstrap to compare ROC area on imputed dataset
  - From: roland andersson <[email protected]>

Prev by Date: Re: st: mlogit using pweight
Next by Date: Re: st: Numbers with decimals and -float- command
Previous by thread: st: Bootstrap to compare ROC area on imputed dataset
Next by thread: Re: st: Bootstrap to compare ROC area on imputed dataset
Index(es):
- Date
- Thread