[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Bootstrap variations

From	Constantine Daskalakis <[email protected]>
To	[email protected]
Subject	st: Bootstrap variations
Date	Thu, 20 Apr 2006 17:12:39 -0400

Hi all:

I have a question on stratified and/or cluster bootstrapping. I am using Stata 8.2 and I am up-to-date for it.

Suppose I have a survey where I sample possibly multiple persons within households (let's call the person-id variable SUBID and the household-id variable HOMID).

Suppose I have a total of 30 households (HOMID = 1, 2, ..., 30), with a total of 68 respondents (SUBID = 1, 2, ..., 68).

There are 1, 2, 3, 4, or 5 respondents per household. So, we can consider 5 strata, according to the number of respondents per household (let's call this stratification variable HOMSIZ):

Stratum 1 (1 respondent per house, HOMSIZ = 1): 10 houses, 10 respondents
Stratum 2 (2 respondent per house, HOMSIZ = 2): 10 houses, 20 respondents
Stratum 3 (3 respondent per house, HOMSIZ = 3): 5 houses, 15 respondents
Stratum 4 (4 respondent per house, HOMSIZ = 4): 2 houses, 8 respondents
Stratum 5 (5 respondent per house, HOMSIZ = 5): 3 houses, 15 respondents

I am planning to use mixed effects or GEE regression for the analysis (and use "homid" as the clustering variable).

What if I want to draw bootstraps from this setup?

I have the following alternatives:

(1)
. bsample

(2)
. bsample, strata(homsiz)

(3)
. bsample, cluster(homid)

(4)
. bsample, strata(homsiz) cluster(homid)

(1) will produce bootstrap samples w/ N=68 respondents (but will not preserve any other feature of the setup).

(2) will produce bootstrap samples w/ N=68 respondents and also preserve the number of respondents in each of the 5 strata (10, 20, 15, 8, 15)

Neither (1) nor (2) will preserve my cluster setup (households), so I will not consider them further.

(3) will produce bootstrap samples w/ M=30 households, but the total number of respondents in each resample will vary (from a minimum of 30, if all 30 households are from the 1st stratum, to a maximum of 150, if all 30 households are drawn from stratum 5).

(4) will produce bootstrap samples w/ both M=30 households and N=68 respondents (and also preserve the number of respondents in each stratum as 10, 20, 15, 8, and 15).

In the 3rd scheme, the number of households with 1, 2, 3, etc respondents is not fixed (and that reflects the way the data were obtained). However, the resamples may have a variable number of observations (units of analyses) and I am worried that I may overestimate the variability.

With the 4th scheme, I am worried that I might underestimate the variability. For example, imagine that the strata are very sparse (i.e., two clusters in each stratum). Then, with this scheme, I will be getting resamples that are more-or-less the original dataset over and over again.

Has anyone dealt with this kind of problem before? Any advice as to the choice between the 3rd and 4th schemes of bootstrapping?

Thank you in advance.
Constantine

The documents accompanying this transmission may contain confidential health or business information. This information is intended for the use of the individual or entity named above. If you have received this information in error, please notify the sender immediately and arrange for the return or destruction of these documents.

________________________________________________________________
Constantine Daskalakis, ScD
Assistant Professor,
Thomas Jefferson University, Division of Biostatistics,
211 S. 9th St., Suite 602, Philadelphia, PA 19107
*** NEW ADDRESS (AS OF 4/17/06) ***
*** 1015 Chestnut St., Suite M100, Philadelphia, PA 19107 ***
Tel: 215-955-5695
Fax: 215-955-5681
Email: [email protected]
Webpage: http://www.jefferson.edu/clinpharm/bio/

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: calculating longitudinal-latitudinal distances
  - From: "Nick Cox" <[email protected]>

Prev by Date: RE: st: RE: Changing positive values to negative in Stata
Next by Date: Re: st: Ordered Probit Regression with censored values
Previous by thread: RE: st: calculating longitudinal-latitudinal distances
Next by thread: st: if you use ESTOUT or OUTREG for exporting stata outout then you should try out XML_TAB as well
Index(es):
- Date
- Thread