[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: bootstrapping survey data

From	"Sayer, Bryan" <[email protected]>
To	"'doug levy'" <[email protected]>, "'[email protected] '" <[email protected]>
Subject	st: RE: bootstrapping survey data
Date	Fri, 11 Jul 2003 15:30:12 -0400

I'd have to think about this for awhile, but it still doesn't seem to me
that you want to be doing both.  Each bootstrap draw can't have the full
population, and you need all observations to get correct variance estimates
from the survey estimates.  Plus, where do the variances come into play?
Are they calculated on each replication and then used somewhere?  And the
sampling weights would need to be adjusted on each draw to reflect the full
population (that's what replicate weights do).  My guess is you are better
off attempting to reflect the sample design in the bootstrap process, and
ignore it in the initial estimation.

Bryan Sayer
Statistician, SSS Inc.
[email protected]

-----Original Message-----
From: doug levy [mailto:[email protected]] 
Sent: Friday, July 11, 2003 10:30 AM
To: Sayer, Bryan; '[email protected] '
Subject: RE: bootstrapping survey data

The rationale for bootstrapping my coefficient
estimates is to include some sense of the uncertainty
of the parameters in a Monte Carlo simulation of the
effect of a policy change. For the first order Monte
Carlo simulation I am taking draws from a conditional distribution defined
by the logit model I estimate using the NHIS. A second order Monte Carlo
analysis reruns this simulation some large number of times, each time using
a conditional distribution defined by the parameter estimates from one of
the bootstrap draws. Thus, the error from the parameter estimates is taken
into account in my simulation results.

All of that said, bootstrapping assumes that the
empirical distribution of the sample is a reasonable approximation of the
actual distribution in the population. In order to have the resamples mimic
the initial sample, I would like to take account of the sampling design, to
the extent that it is known to me. While I don't have exact knowledge of the
sampling design, it is my sense that for the purposes of making my
simulation plausible, I should include what elements of the sampling design
I can. Thus, my interest in bootstrapping my estimates from the NHIS. I
welcome any opinions on either my analysis strategy in general or the Stata
problem in particular.

Best,
Doug Levy

--- "Sayer, Bryan" <[email protected]> wrote:
> I don't understand the point of doing both a
> bootstrap and explicitly
> accounting for the sample design (I'm presuming you
> set PSU and strata since
> you specified svylogit).
> 
> Anyway, bootstrapping of complex survey design data
> is not especially well
> developed, outside of replication weights.
> Generally, setting up the
> process requires knowledge of the sampling variables
> that might not be
> public. 
> 
> NHIS is properly handled in Stata simply by setting
> PSU, strata and pweight
> (absent a very small subpop).
> 
> Bryan Sayer
> Statistician, SSS Inc.
> 
> -----Original Message-----
> From: doug levy
> To: [email protected]
> Sent: 7/10/03 4:40 PM
> Subject: st: bootstrapping survey data
> 
> Dear Statalisters,
> I am trying to get bootstrapped estimates of
> coefficients from a complex survey (the National
> Health Interview Survey). To get standard estimates,
> I
> would type "svylogit Y X" after having set up the
> weights, strata, and psu's. To get bootstrap
> estimates, my first inclination was to run
> 'bootstrap
> "logit Y X [pw=weight]" _b, reps(1000)
> strata(stratum)
> psu(psu)'. However, Stata does not allow weights in bootstrap. I was 
> able to trick Stata into including weights by using only the weights 
> in the svyset command and running 'bootstrap "svylogit Y X" _b,
> reps(1000) strata(stratum) psu(psu)'. This gives me
> point estimates similar to the non-bootstrapped
> estimates, which is reassuring, but will I get
> reasonable standard errors? Is there a more orthodox
> way of coding this? 
> Many thanks for any and all advice,
> Doug Levy
> 
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
> *
> *   For searches and help try:
> *  
> http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: Calcualting CIs for Beta distribution
Next by Date: st: multinomial logit on grouped data
Previous by thread: st: RE: bootstrapping survey data
Next by thread: st: Cox predictive failure plot
Index(es):
- Date
- Thread