[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: dropped observations in bootstrap

From   Michael McCulloch <>
Subject   Re: st: dropped observations in bootstrap
Date   Sun, 7 Oct 2007 10:59:49 -0700

Thank you Maarten for illustrating how to set temporary variables within a program. This has solved the problem of dropped observations, and the same bootstrap program now runs successfully. My temporary elation is tempered by the nagging discomfort of not knowing why the use of temporary variables resolved the issue. I'll proceed with your suggested -trace- approach. Many thanks, Michael

--- Michael McCulloch <> wrote:
 I'm running a weighted Cox regression within a bootstrap program.
 If I conduct the analysis program, step by step outside the bootstrap

 it runs just fine.
 However, when I conduct the exact same analysis within the bootstrap
 program, I get the error message:

 "Bootstrap replications (50)
 ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    50
 insufficient observations to compute bootstrap standard errors
 no results will be saved

 What I'd like to learn is: how can I discover what happened to those
 observations during the bootstrap program? Note that these
 observations are not dropped during the manual analysis outside the
 bootstrap program.

 I've illustrated my code below, using an adapted version of Stata's
 system data file cancer.dta.
One step you can take is to call the program -msm- you created, outside
-bootstrap- and see if it works, you can call it a second time and you
will see that it no longer works because several variables are
-generate-ed which already exist (where created during the first call
to -msm-). How did I know to call -msm- twice? I made the same mistake
you made many many times... One solution is to put the variables and
weights in temporary variables, as is shown in the example below.

Another usefull debuging tool is to first set trace on before calling
-msm-, and it is probably useful to first set the tracedepth to 2 or 3,
to avoid being swamped with output.

*---------------- begin example that works ------------
sysuse cancer, replace
ren died failed
tab drug
gen newdrug=0
replace newdrug=1 if drug==2|drug==3
drop drug
rename newdrug drug
tab drug

gen yeartreated=2007
gen monthtreated=10
gen daytreated=05

gen addyear=0
replace addyear=1 if studytime>=12 & studytime<24
replace addyear=2 if studytime>=24 & studytime<36
replace addyear=3 if studytime>=36

gen addmonths=0
replace addmonths=studytime-12 if studytime>=12 & studytime<24
replace addmonths=studytime-24 if studytime>=24 & studytime<36
replace addmonths=studytime-36 if studytime>36

gen yeardied=2007
replace yeardied= yeartreated + addyear if addyear!=0
list yeartreated yeardied addyear

gen monthdied=10
replace monthdied = monthtreated + addmonths if addmonths!=0
list monthtreated monthdied addmonths
replace yeardied=yeardied+1 if monthdied>=12 & monthdied<24
gen newmonthdied=monthdied
replace newmonthdied=monthdied-12 if monthdied>=12 & monthdied<24
replace newmonthdied=monthdied-24 if monthdied>=24
drop monthdied
gen monthdied = newmonthdied
drop newmonthdied

gen daydied=05

replace monthdied=monthdied+studytime if studytime<12
replace yeardied=yeardied+1 if monthdied>12
replace monthdied =monthdied-12 if monthdied>12

gen datedx = mdy(monthtreated, daytreated, yeartreated)
format datedx %d

gen datedied = mdy(monthdied, daydied, yeardied)
format datedied %d

sort datedied

list datedx datedied studytime

* PART 2
capture program drop msm
program define msm, rclass
tempname p_tcm_ALL p_notcm_ALL p_ALL p_noALL wt_stab_ALL

* fit the full model used in Traditional Cox
logit drug age

*estimate probability of treatment
predict `p_tcm_ALL'
gen `p_notcm_ALL' = 1-`p_tcm_ALL' if drug==0
*want p(drug) for both users and nonusers
replace `p_tcm_ALL' =`p_notcm_ALL' if drug==0

*create stabilized weight==P(A)/P(A|W)
logit drug
predict `p_ALL'
gen `p_noALL'=1-`p_ALL' if drug==0
replace `p_ALL' = `p_noALL' if drug==0
gen `wt_stab_ALL'=`p_ALL'/`p_tcm_ALL'

* stset the data
stset datedied [iweight=`wt_stab_ALL'], failure(failed) ///
origin(datedx) scale(30.4375) /*wt: stabilized IPTW*/

* cox
stcox drug
indeplist, local
foreach var of varlist `X' {
return scalar `var' = _b[`var']

set seed 12358
bootstrap drug=r(drug), eform reps(50): msm
*---------------------- end example that works -----------------
(For more on how to use examples I sent to the Statalist, see )

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Want ideas for reducing your carbon footprint? Visit Yahoo! For Good
* For searches and help try:
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index