Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple imputation with survey replicate weights

From	Joshua Mitts <[email protected]>
To	[email protected]
Subject	Re: st: Multiple imputation with survey replicate weights
Date	Thu, 20 Feb 2014 13:48:56 -0500

Thanks very much.

Josh

On Thu, Feb 20, 2014 at 10:33 AM, Stas Kolenikov <[email protected]> wrote:
> Stata is doing the right job in preventing you from doing dubious
> things. The interface of complex survey data inference and multiple
> imputation is surprisingly poorly studied given its ubiquity. The
> statistically appropriate way to combine imputation and replicate
> weights that I am aware of is to use the bootstrap or BRR approach;
> create a single imputation within each bootstrap/BRR replicate; and
> re-estimate your model with that replicate weight based on imputed
> data. See Shao and Sitter (1996;
> http://www.citeulike.org/user/ctacmo/article/1269394). At the moment,
> this requires custom programming of an estimation command that
> combines one imputation iteration with the command of interest. I am
> vaguely planning to develop a Stata Journal paper to describe the
> process, but it is only at the conceptualization stage now. Here's an
> example (not particularly stable, the combinations of -mi- and -svy-
> are still tricky, as they have contradicting expectations of what is
> known about the data, and I have to force one to ignore the other, and
> vice versa):
>
> webuse nhanes2brr, clear
> gen age2 = age*age
> cap pro drop mymireg
> program define mymireg, properties( svyb )
> syntax [varlist] [if] [in] [pw iw /] , [*]
>   * local macro `weight' contains the type
>   * local macro `exp' contains the weight variable
>   * local macro varlist contains the list of explanatory variables for
> the final regression
>   * it is used to circumvent Stata from thinking that estimation has
> already been done
>   preserve
>   mi set wide
>   mi register regular region1 region2 region3 rural black orace age
> age2 tibc tcresult
>   mi register imputed lead zinc copper vitaminc albumin tgresult
>   mi impute chained (pmm) lead zinc copper vitaminc albumin tgresult =
> region1 region2 region3 ///
>      rural black  orace age age2 tibc tcresult [pw=`exp'], add(1)
>   mi extract 1, clear
>   logistic highbp lead `varlist' [pw=`exp']
>   restore
> end
> svy brr, saving( lead_imputed_logit, replace ) : mymireg height weight
> age female
> use lead_imputed_logit, clear
> sum
>
> Use at your own risk. Let me repeat: USE AT YOUR OWN RISK. May be like that:
>
> use at_your_own_risk, clear?
>
> A few caveats:
> 1. -svy brr- will report point estimates based on a single imputation;
> these are useless, and would need to be discarded
> 2. The right coefficients and the standard errors come out of the
> -summarize- in the end. I used to be able to produce them with -bs4rw-
> followed by -estat bootstrap-, but for whatever reasons it stopped
> working (it used to in 2010) -- probably the internal format of what
> -bootstrap- expects changed, and what -bs4rw- supplies is no longer
> compatible with it.
> 3. I used the equivalence between the bootstrap and BRR; things will
> not work appropriately with jackknife, as it does not provide enough
> sampling variability, and the imputation model will be too close to
> that based on the full data. Hence, sampling variability in the
> imputation model will be insufficient, and the standard errors will be
> underestimated. Likewise, the compressed replicate weight variability
> methods (BRR with Fay's adjustment; mean bootstrap) may not be able to
> generate enough sampling variability in the imputation process,
> either.
> 4. As you clearly see, the code is cumbersome, and probably not
> particularly efficient -- I may have been able to better deal with -mi
> extract-, for instance, and all these -preserve-s are obviously going
> to eat up a good fraction of computing time with large data sets.
>
> -- Stas Kolenikov, PhD, PStat (ASA, SSC)
> -- Principal Survey Scientist, Abt SRBI
> -- Opinions stated in this email are mine only, and do not reflect the
> position of my employer
> -- http://stas.kolenikov.name
>
>
>
> On Wed, Feb 19, 2014 at 4:41 PM, Joshua Mitts <[email protected]> wrote:
>> Has anyone found a way to use survey replicate weights with multiply
>> imputed data?  The svy manual states:
>>
>> mi estimate may be used with svy linearized if the estimation command
>> allows mi estimate; it may not be used with svy bootstrap, svy brr,
>> svy jackknife, or svy sdr.
>>
>> And I receive this error when trying to fit a logit model:
>>
>> vce(brr) previously set by mi svyset is not allowed with mi estimate
>>
>> Thanks very much,
>> Josh
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Multiple imputation with survey replicate weights
  - From: Joshua Mitts <[email protected]>
- Re: st: Multiple imputation with survey replicate weights
  - From: Stas Kolenikov <[email protected]>

Prev by Date: RE: st: insheet and dropping cases
Next by Date: Re: st: identifying highest number of consecutive variables where answer is consistent across observation
Previous by thread: Re: st: Multiple imputation with survey replicate weights
Next by thread: st: SYS GMM with xtabond2 [company profitability]
Index(es):
- Date
- Thread