Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple imputation with survey replicate weights

From   Richard Williams <>
To, "" <>
Subject   Re: st: Multiple imputation with survey replicate weights
Date   Thu, 20 Feb 2014 10:52:51 -0500

Add this to the wish list for Stata 14 (or 15 or 16...). If nothing else, maybe add some FAQs about what you can and cannot do when trying to use multiple advanced methods simultaneously.

Stas, you say you used to be able to do what you want back in 2010. Can you do it now if you specify version control, e.g.

version 11

At 10:33 AM 2/20/2014, Stas Kolenikov wrote:
Stata is doing the right job in preventing you from doing dubious
things. The interface of complex survey data inference and multiple
imputation is surprisingly poorly studied given its ubiquity. The
statistically appropriate way to combine imputation and replicate
weights that I am aware of is to use the bootstrap or BRR approach;
create a single imputation within each bootstrap/BRR replicate; and
re-estimate your model with that replicate weight based on imputed
data. See Shao and Sitter (1996; At the moment,
this requires custom programming of an estimation command that
combines one imputation iteration with the command of interest. I am
vaguely planning to develop a Stata Journal paper to describe the
process, but it is only at the conceptualization stage now. Here's an
example (not particularly stable, the combinations of -mi- and -svy-
are still tricky, as they have contradicting expectations of what is
known about the data, and I have to force one to ignore the other, and
vice versa):

webuse nhanes2brr, clear
gen age2 = age*age
cap pro drop mymireg
program define mymireg, properties( svyb )
syntax [varlist] [if] [in] [pw iw /] , [*]
  * local macro `weight' contains the type
  * local macro `exp' contains the weight variable
  * local macro varlist contains the list of explanatory variables for
the final regression
  * it is used to circumvent Stata from thinking that estimation has
already been done
  mi set wide
  mi register regular region1 region2 region3 rural black orace age
age2 tibc tcresult
  mi register imputed lead zinc copper vitaminc albumin tgresult
  mi impute chained (pmm) lead zinc copper vitaminc albumin tgresult =
region1 region2 region3 ///
     rural black  orace age age2 tibc tcresult [pw=`exp'], add(1)
  mi extract 1, clear
  logistic highbp lead `varlist' [pw=`exp']
svy brr, saving( lead_imputed_logit, replace ) : mymireg height weight
age female
use lead_imputed_logit, clear

Use at your own risk. Let me repeat: USE AT YOUR OWN RISK. May be like that:

use at_your_own_risk, clear?

A few caveats:
1. -svy brr- will report point estimates based on a single imputation;
these are useless, and would need to be discarded
2. The right coefficients and the standard errors come out of the
-summarize- in the end. I used to be able to produce them with -bs4rw-
followed by -estat bootstrap-, but for whatever reasons it stopped
working (it used to in 2010) -- probably the internal format of what
-bootstrap- expects changed, and what -bs4rw- supplies is no longer
compatible with it.
3. I used the equivalence between the bootstrap and BRR; things will
not work appropriately with jackknife, as it does not provide enough
sampling variability, and the imputation model will be too close to
that based on the full data. Hence, sampling variability in the
imputation model will be insufficient, and the standard errors will be
underestimated. Likewise, the compressed replicate weight variability
methods (BRR with Fay's adjustment; mean bootstrap) may not be able to
generate enough sampling variability in the imputation process,
4. As you clearly see, the code is cumbersome, and probably not
particularly efficient -- I may have been able to better deal with -mi
extract-, for instance, and all these -preserve-s are obviously going
to eat up a good fraction of computing time with large data sets.

-- Stas Kolenikov, PhD, PStat (ASA, SSC)
-- Principal Survey Scientist, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer

On Wed, Feb 19, 2014 at 4:41 PM, Joshua Mitts <> wrote:
> Has anyone found a way to use survey replicate weights with multiply
> imputed data?  The svy manual states:
> mi estimate may be used with svy linearized if the estimation command
> allows mi estimate; it may not be used with svy bootstrap, svy brr,
> svy jackknife, or svy sdr.
> And I receive this error when trying to fit a logit model:
> vce(brr) previously set by mi svyset is not allowed with mi estimate
> Thanks very much,
> Josh
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index