[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Ergo, Alex" <aergo@jhsph.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: The use of ice with data from surveys with complex design |

Date |
Tue, 28 Oct 2008 14:57:30 -0400 |

This is absolutely great, Stas. Thanks so much! Can't wait to try it out. It could indeed be a nice Stata Journal contribution . Alex ________________________________________ From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Stas Kolenikov [skolenik@gmail.com] Sent: Tuesday, October 28, 2008 1:05 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: The use of ice with data from surveys with complex design When you impute the missing data for complex samples and utilize the bootstrap, you are working at an interface of three issues: complex survey designs, resampling, and missing data. So think that problems with any of them get magnified by others, and sometimes interactions can lead you to some situations that have no solutions at all. The best researched into procedures are as follows (Shao and Sitter 1996, http://www.citeulike.org/user/ctacmo/article/1269394). 1. Select an appropriate bootstrap subsample that takes into account your complex survey design. Usually if you have n_h PSUs in stratum h, you would want to select n_h-1 PSUs with replacement from those n_h; the different number is the scaling factor that asymptotically does not matter for large samples, but does matter for typical survey settings where n_h is often as low as 2. 2. Run your imputation procedure on that bootstrap sample: estimate the models, produce (a single!) imputed set. 3. If you had any other non-response and post-stratification adjustments working on your weights, perform those and get modified weights for your current sub-sample. 4. Store that as a new data set, or run your estimation and store the results (in Stata, the mechanics is through -post- command). 5. Repeat 1-4 sufficiently many times, whichever number you like better for the bootstrap. It is not 3 or 5 with multiple imputation, it is 200 or 500 with the bootstrap. 6. Combine the results -- in Stata, you might be able to trick bootstrap post-estimation commands to accept the .dta file produced by those -post- commands to use as the input. Whether -ice- does all of that, I have no idea. I doubt that though. While this may be a reasonably straightforward algorithm, it may have sufficiently many subtle points (like redoing the pweights) that may prevent it from going into a canned routine. If I were doing this all, I would write my own resampling scheme for step 1, run -ice- for step 2, do the adjustments in step 3 (if you are the data provider, and if you do know all those corrective schemes -- if you are using public data, there may not be much you can do without access to the internal variables and the population counts that might have been used for post-stratification), and -post- the results in step 4. That all is better organized through Jeff Pitblado's -bs4rw- and my -bsweights-: the former takes care of the bootstrap cycles, and the latter, of the bootstrap subsampling and reweighting (and scaling and what not). If you are happy with skipping the weight adjustment step, then you can have an outline like this. First, write your own wrapper to supply to -bs4rw- that would allow for weights as an input, and will contain all other variable names hard coded (I am using -zip- as an arbitrary estimation comand) program def myestim, eclass syntax [pw iw/] ice whatever [pw=`exp'] , m(1) other options zip whatever [pw=`exp'], inflate(whatever) end Then, set up the replication weights: bsweights bsw* , rep(200) n(-1) Then, run your bootstrap: bs4rw , rw(bsw*) : myestim [pw=original weight] (Wow, with some formatting and a substantive example, it would make a neat Stata Journal contribution :)) On 10/28/08, Ergo, Alex <aergo@jhsph.edu> wrote: > Dear All, > > What is the best way to account for the complex survey design when using the 'ice' command to impute missing values? Is it through the use of the 'boot' option combined with the use of weights? Or can it somehow be accounted for when specifying the cmd() option? -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: The use of ice with data from surveys with complex design***From:*"Ergo, Alex" <aergo@jhsph.edu>

**Re: st: The use of ice with data from surveys with complex design***From:*"Stas Kolenikov" <skolenik@gmail.com>

- Prev by Date:
**st: RES: examples propensity score matching** - Next by Date:
**Re: st: examples propensity score matching** - Previous by thread:
**Re: st: The use of ice with data from surveys with complex design** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |