Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Bootstrapping Harrell's C - problem with freezing model to establish optimism - stepwise etc.


From   "Jon Kroll Bjerregaard" <jkbjerregaard@email.dk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Bootstrapping Harrell's C - problem with freezing model to establish optimism - stepwise etc.
Date   Wed, 23 Feb 2011 23:34:47 +0100

I think I might have had luck making my own bootstrapping program

******************************************
capture program drop myboot
program define myboot, rclass
 preserve 
 bsample 
 xi: sw, pr(.15) lockterm1: stcox (i.AJCC i.inf_PS) zalder i.gender vol_GTV
i.forb_regime i.hem_LNL zwbc zthromb i.LDH_UNL i.ALAT_UNL i.BASP_UNL
sero_bili i.resection_perf, efron
 estat concordance
 return scalar c1 = r(C)
 xi: stcox i.AJCC i.inf_PS i.BASP_UNL vol_GTV i.resection_perf
i.forb_regime, efron
 estat concordance
 return scalar c2 = r(C)
  restore
end
simulate c1=r(c1) c2=r(c2), reps(200) seed (1234567): myboot
gen optimism = c1-c2
sum optimism
*****************************************************

Sincerely

Jon Kroll Bjerregaard


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jon Kroll
Bjerregaard
Sent: 23. februar 2011 21:49
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: Bootstrapping Harrell's C - problem with freezing model to
establish optimism - stepwise etc.

Thank you for your answer, I have read your publications (but perhaps not
fully understood them)

I am attempting to copy the method described by Harrell in his publication
in STATISTICS IN MEDICINE, VOL. 15,361-387 - 1996.

I quote from the article
*********************************************
1. Develop the model using all n subjects and whatever stepwise testing is
deemed necessary. 
Let Dapp denote the apparent D from this model, i.e., the rank correlation
computed on the same sample used  to derive the fit. 
2.  Generate a sample of size n with replacement from the original sample
(for both predictors and the response). 
3.  Fit the full or possibly stepwise model, using the same stopping rule as
was used to derive Dapp. 
4.  Compute the apparent D for this model on the bootstrap sample with
replacement. Call it Dboot 5.  'Freeze'  this  reduced  model,  and
evaluate  its  performance  on the  original  dataset.  Let Dorig denote the
D 6.  The optimism in the fit from the bootstrap sample is Dboot - Dorig. 
7.  Repeat steps 2 to 6 - 100 to 200 times. 
8.  Average the optimism estimates to arrive at 0. 
9.  The  bootstrap  corrected  performance  of  the  original  stepwise
model  is  D - 0. This difference is  a  nearly  unbiased  estimate  of  the
expected  value  of the  external  predictive discrimination of  the process
which generated DapV In other words, Dapp - 0 is an honest estimate of
internal  validity, penalizing for overfitting. 
******************************
As I read this I am not to split my dataset into training and tests sets,
but rather to use my whole set as the set for my stepwise function.
Then when that model has run, repeat my final model on the same sample and
calculate the difference.

As I read your 2010 publication section 3 your split the data set and then
make the tests. 

What I am presently doing (with some difficulties is using my subprogram and
running it twice then subtracting them, since I bootstrap the sample from my
full data set.

I absolutely agree this is not a "regular" bootstrap but rather my simple
attempt of trying to get the optimism out, however I run into problems
running this subprogram

*********************************************
capture program drop b_conc
program define b_conc, rclass
	xi: stepwise, pr(.15) lockterm1: stcox (i.AJCC i.inf_PS) zalder
i.gender vol_GTV i.forb_regime i.hem_LNL zwbc zthromb i.LDH_UNL i.ALAT_UNL
i.BASP_UNL sero_bili i.resection_perf, efron
	estat concordance
	return scalar c = r(C)
	end
bs c=r(c), reps(200) seed (1234567) saving(myfile_full, replace): b_conc
**********************

my plan was simply to run this then one with the my final model and then
subtract the 2 D's (or C's ) then get the optimism, sadly the stepwise
function fails for reasons not fully understood.

I do hope I haven't misunderstood the Harell method, but would appreciate
any advice given.

Sincerely 

Jon K. Bjerreaard

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roger Newson
Sent: 21. februar 2011 15:51
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Bootstrapping Harrell's C - problem with freezing model to
establish optimism - stepwise etc.

I don't know who "Choot" and "Corig" are, although I think I know which
2 papers by Harrell et al. you are referring to. You should include
references, because not everybody on the list will know the papers to which
you refer.

However, I have written a paper on the use of Harrell's c (and Somers' 
D) with models in general and survival models in particular (Mewson, 2010).
This stresses the importance of training sets and test sets, and discusses
the Harrell et al. methods. The Harrell et al. methods are bootstrap-like,
but are not the bootstrap. Instead, the user must divide the data into
multiple pairs of a training set and a test set, and, for each training
set-test set pair, estimate the optimism, and then calculate the confidence
limits using methods similar to the bootstrap. 
The -bs- command does not do this for you. You will probably have to write
your own program for defining multiple test sets and multiple training sets.

I hope this helps. Let me know if you have any more queries.

Best wishes

Roger


References

Newson RB. 2010. Comparing the predictive powers of survival models using
Harrell's C or Somers' D. The Stata Journal 10(3): pp. 339-358 . 
Purchase from
http://www.stata-journal.com/article.html?article=st0198
or download a pre-publication draft from
http://www.imperial.ac.uk/nhli/r.newson/papers.htm#papers_in_journals


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group National Heart and Lung
Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel
Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: r.newson@imperial.ac.uk
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgene
tics/reph/

Opinions expressed are those of the author, not of the institution.

On 19/02/2011 08:06, Jon Kroll Bjerregaard wrote:
> Hello
>
> I'm trying to determine the optimism - as described by Harrell et al. 
> for
a
> Cox model established for pancreatic cancer (with Harrell's C instead 
> of Somers' D).
>
> I have made a model including clinical (forced into) and clinical 
> (stepwise'ed selected) variables - I have 150 events in 178 patients.
>
> Selection statement
> xi: stepwise, pr(.15) lockterm1: stcox (i.AJCC i.inf_PS) zalder 
> i.gender vol_GTV i.forb_regime i.hem_LNL zwbc zthromb i.LDH_UNL 
> i.ALAT_UNL
i.BASP_UNL
> sero_bili i.resection_perf
> Ending with the final model
> xi: stcox i.AJCC i.inf_PS i.BASP_UNL vol_GTV i.resection_perf
i.forb_regime
> Which is a mixture of continuous and categorical variables.
>
> This is what I'm trying to do(as Harrell describes 1996/2001):
> Bootstrap Harrell's C from the full model including the stepwise
selection-
> (Cboot)
> "Freeze" the bootstrapped model and apply it to the original dataset 
> and calculate Harrell's C (Corig) Calculate optimism from: Cboot-Corig 
> Repeat 200 times bootstrap
>
> So this is where my problems start (or my lack of skills)
>
> I use this program - adapted from another statalist post
> ****************************************************
> capture program drop b_conc
> program define b_conc, rclass
>                               xi: stepwise, pr(.15) lockterm1: stcox
(i.AJCC
> i.inf_PS) zalder i.gender vol_GTV i.forb_regime i.hem_LNL zwbc zthromb 
> i.LDH_UNL i.ALAT_UNL i.BASP_UNL sero_bili i.resection_perf, efron
>                               estat concordance
>                               return scalar c = r(C)
>                               end
> bs d=r(c), reps(200) seed (123456) saving(myfile, replace): b_conc
> ***************************************************
> Then I do another one with the final model and substract them - but 
> this
is
> not really the plan.
>
> I have several problems with this since it refuses to perform the
bootstrap
> (I get a lot of x's) which is most likely due to not using temporary 
> variables - haven't figured out exactly what is wrong yet.
> I also need to put in the "freezed" model and apply to the original
dataset
> - which I'm not sure how I get into a bootstrap routine.
>
> Thanks in advanced
>
> Jon K. Bjerregaard, MD.
> Dep. of Oncology, Odense University Hospital
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index