Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: bootstrapping two-part model and svy

 From Sarah Beth Link To statalist@hsphsun2.harvard.edu Subject st: bootstrapping two-part model and svy Date Fri, 17 Sep 2010 23:20:15 -0400

```Dear Stata users,

I am using a two-part model in stata 10.  The first part of the model
is a logit that predicts the probability of a positive outcome and the
second part of the model is a GLM estimated only on observations with
positive outcomes (in this case expenditures).  The final outcome I am
interested in is the average treatment effect of a dummy variable.  I
would like to bootstrap the entire program to get the 95% confidence
interval on the average treatment effect.  I was able to successfully
bootstrap the following:

bootstrap "two_part_glm_model" r(sate), reps(1000) dots noesample

where "two_part_glm_model" is an ado file with the following code:
program define two_part_glm_model,rclass
svy: logit \$posy \$x
predict phat0, p
replace tx=1
predict phat1, p
replace tx=txtrue
svy:glm \$y \$x if asumexp>0, family(gamma) link(log)
replace tx=0
predict exphat0, mu
replace tx=1
predict exphat1, mu
replace tx=txtrue
gen pred2pt0=phat0*exphat0
gen pred2pt1=phat1*exphat1
gen ate=pred2pt1-pred2pt0
sum ate
return scalar sate=r(mean)
drop phat* exphat* pred2pt* ate*
end

However, I would like to use the survey mean of the average treatment
effect (ate) in the program.  Which brings me to my first question --
does it make sense to use bootstrap to get the confidence interval on
the average treatment effect and use svy: mean to take into account
the survey weighting?  I understand that the bootstrap command
performs random sampling that does not take into account the survey
characteristics.  If not, then is my only other option ignoring the
survey characteristics?

I attempted the following code, which is very similar to the above:

bootstrap "svy_two_part" r(sate), reps(2) dots noesample

where "svy_two_part" is an ado file with the following code:
program define svy_two_part,rclass
svy: logit \$posy \$x
replace tx=0
predict phat0, p
replace tx=1
predict phat1, p
replace tx=txtrue
svy:glm \$y \$x if asumexp>0, family(gamma) link(log)
predict xbeta, mu
replace tx=0
predict exphat0, mu
replace tx=1
predict exphat1, mu
replace tx=txtrue
gen pred2pt0=phat0*exphat0
gen pred2pt1=phat1*exphat1
gen ate=pred2pt1-pred2pt0
svy: mean ate
estat sd
return scalar sate=r(mean)
drop phat* exphat* pred2pt* ate*
end

I get the following error message
command -> svy_two_part
an error occurred when command was executed on original dataset
please rerun bootstrap and specify the trace option to see a trace of the
commands bootstrap executed
r(110);"

This brings me to my second question -- do you know why I would be
getting the above error message?  I used trace and could not figure it
out.  I also ran the program separately within the do file and did not
find any errors.