Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: The use of ice with data from surveys with complex design


From   "Ergo, Alex" <aergo@jhsph.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: The use of ice with data from surveys with complex design
Date   Tue, 28 Oct 2008 14:57:30 -0400

This is absolutely great, Stas. Thanks so much! Can't wait to try it out.
It could indeed be a nice Stata Journal contribution .
Alex

________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Stas Kolenikov [skolenik@gmail.com]
Sent: Tuesday, October 28, 2008 1:05 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: The use of ice with data from surveys with complex design

When you impute the missing data for complex samples and utilize the
bootstrap, you are working at an interface of three issues: complex
survey designs, resampling, and missing data. So think that problems
with any of them get magnified by others, and sometimes interactions
can lead you to some situations that have no solutions at all.

The best researched into procedures are as follows (Shao and Sitter
1996, http://www.citeulike.org/user/ctacmo/article/1269394).

1. Select an appropriate bootstrap subsample that takes into account
your complex survey design. Usually if you have n_h PSUs in stratum h,
you would want to select n_h-1 PSUs with replacement from those n_h;
the different number is the scaling factor that asymptotically does
not matter for large samples, but does matter for typical survey
settings where n_h is often as low as 2.

2. Run your imputation procedure on that bootstrap sample: estimate
the models, produce (a single!) imputed set.

3. If you had any other non-response and post-stratification
adjustments working on your weights, perform those and get modified
weights for your current sub-sample.

4. Store that as a new data set, or run your estimation and store the
results (in Stata, the mechanics is through -post- command).

5. Repeat 1-4 sufficiently many times, whichever number you like
better for the bootstrap. It is not 3 or 5 with multiple imputation,
it is 200 or 500 with the bootstrap.

6. Combine the results -- in Stata, you might be able to trick
bootstrap post-estimation commands to accept the .dta file produced by
those -post- commands to use as the input.

Whether -ice- does all of that, I have no idea. I doubt that though.
While this may be a reasonably straightforward algorithm, it may have
sufficiently many subtle points (like redoing the pweights) that may
prevent it from going into a canned routine. If I were doing this all,
I would write my own resampling scheme for step 1, run -ice- for step
2, do the adjustments in step 3 (if you are the data provider, and if
you do know all those corrective schemes -- if you are using public
data, there may not be much you can do without access to the internal
variables and the population counts that might have been used for
post-stratification), and -post- the results in step 4. That all is
better organized through Jeff Pitblado's -bs4rw- and my -bsweights-:
the former takes care of the bootstrap cycles, and the latter, of the
bootstrap subsampling and reweighting (and scaling and what not).

If you are happy with skipping the weight adjustment step, then you
can have an outline like this. First, write your own wrapper to supply
to -bs4rw- that would allow for weights as an input, and will contain
all other variable names hard coded (I am using -zip- as an arbitrary
estimation comand)

program def myestim, eclass
  syntax [pw iw/]
  ice whatever [pw=`exp'] , m(1) other options
  zip whatever [pw=`exp'], inflate(whatever)
end

Then, set up the replication weights:

bsweights bsw* , rep(200) n(-1)

Then, run your bootstrap:

bs4rw , rw(bsw*) : myestim [pw=original weight]

(Wow, with some formatting and a substantive example, it would make a
neat Stata Journal contribution :))

On 10/28/08, Ergo, Alex <aergo@jhsph.edu> wrote:
> Dear All,
>
>  What is the best way to account for the complex survey design when using the 'ice' command to impute missing values? Is it through the use of the 'boot' option combined with the use of weights? Or can it somehow be accounted for when specifying the cmd() option?


--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index