Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: bootstrapping two-part model and svy


From   Sarah Beth Link <[email protected]>
To   [email protected]
Subject   st: bootstrapping two-part model and svy
Date   Fri, 17 Sep 2010 23:20:15 -0400

Dear Stata users,

I am using a two-part model in stata 10.  The first part of the model
is a logit that predicts the probability of a positive outcome and the
second part of the model is a GLM estimated only on observations with
positive outcomes (in this case expenditures).  The final outcome I am
interested in is the average treatment effect of a dummy variable.  I
would like to bootstrap the entire program to get the 95% confidence
interval on the average treatment effect.  I was able to successfully
bootstrap the following:

         bootstrap "two_part_glm_model" r(sate), reps(1000) dots noesample

where "two_part_glm_model" is an ado file with the following code:
         program define two_part_glm_model,rclass
         svy: logit $posy $x
         predict phat0, p
         replace tx=1
         predict phat1, p
         replace tx=txtrue
         svy:glm $y $x if asumexp>0, family(gamma) link(log)
         replace tx=0
         predict exphat0, mu
         replace tx=1
         predict exphat1, mu
         replace tx=txtrue
         gen pred2pt0=phat0*exphat0
         gen pred2pt1=phat1*exphat1
         gen ate=pred2pt1-pred2pt0
         sum ate
         return scalar sate=r(mean)
         drop phat* exphat* pred2pt* ate*
         end

However, I would like to use the survey mean of the average treatment
effect (ate) in the program.  Which brings me to my first question --
does it make sense to use bootstrap to get the confidence interval on
the average treatment effect and use svy: mean to take into account
the survey weighting?  I understand that the bootstrap command
performs random sampling that does not take into account the survey
characteristics.  If not, then is my only other option ignoring the
survey characteristics?

I attempted the following code, which is very similar to the above:

         bootstrap "svy_two_part" r(sate), reps(2) dots noesample

where "svy_two_part" is an ado file with the following code:
         program define svy_two_part,rclass
         svy: logit $posy $x
         replace tx=0
         predict phat0, p
         replace tx=1
         predict phat1, p
         replace tx=txtrue
         svy:glm $y $x if asumexp>0, family(gamma) link(log)
         predict xbeta, mu
         replace tx=0
         predict exphat0, mu
         replace tx=1
         predict exphat1, mu
         replace tx=txtrue
         gen pred2pt0=phat0*exphat0
         gen pred2pt1=phat1*exphat1
         gen ate=pred2pt1-pred2pt0
         svy: mean ate
         estat sd
         return scalar sate=r(mean)
         drop phat* exphat* pred2pt* ate*
         end

I get the following error message
"phat0 already defined
command -> svy_two_part
an error occurred when command was executed on original dataset
please rerun bootstrap and specify the trace option to see a trace of the
commands bootstrap executed
r(110);"

This brings me to my second question -- do you know why I would be
getting the above error message?  I used trace and could not figure it
out.  I also ran the program separately within the do file and did not
find any errors.

I will greatly appreciate your comments, suggestions, and advice.

Thank you very much,
Sarah Beth

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index