Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

reL Re: st: multiple imputation and propensity score

From   "Ariel Linden, DrPH" <[email protected]>
To   <[email protected]>
Subject   reL Re: st: multiple imputation and propensity score
Date   Thu, 25 Aug 2011 13:40:43 -0400

I have a completely different take on this problem than has been discussed
thus far in the thread. That doesn't mean I think these suggestions are
wrong, but I would tackle the problem differently.

First, my primary concern is hearing that Stefano generates multiple
propensity scores for the same individual and that some of them " makes no
sense, being virtually the same in patients treated with angioplasty or

This is problematic for two reasons: first, the propensity score is not
intended to differentiate between treatment and control units, but instead
find a common basis between them (e.g. on average, they should have similar
baseline characteristics with the only difference between them being that
some got treatment and some didn't). Second, I am not sure I agree with
generating multiple propensity scores and then choosing which ones will
represent the match. It is entirely possible under this scenario to generate
completely different matches, with the characteristics being very different
across matched groups. 

Those are my basic concerns. Now to solving the problem. One approach to
dealing with missing values is to add a related variable describing its
"missingness", and then use that in the propensity score estimation process.
So for example, if we have a variable "gender" with some values missing,
we'd generate another variable called "gender_miss", with a value of 1 if
gender is missing and 0 if not. I can provide references where this approach
is used.

This solution could be problematic is there are too many missing values
across many different variables, but that is perhaps beyond the scope of
this discussion.

I hope this helps


Date: Wed, 24 Aug 2011 12:48:47 -0500
From: Stas Kolenikov <[email protected]>
Subject: Re: st: multiple imputation and propensity score

On Wed, Aug 24, 2011 at 11:39 AM, Stefano Di Bartolomeo
> In truth I am trying to be humble and apply the best methodology I can. I
got tricked into this problem in 2 simple steps. First I read  'A Guide to
Imputing Missing Data with Stata by Mark Lunt', which is a step by step
guide for non-pundits like me. Throughout the guide a propensity score is
the main goal of the examples. So I got the feeling that multiple imputation
is good for propensity score and did that. Then, I reviewed the recent
literature on propensity scores and it seems that matching is the technique
that most reduces bias as compared to stratification on quintiles  or
inclusion of PS as covariate. And again, tried to follow the suggestion. Now
I understand I have to give up one of the two techniques.

I believe you could still see through your approach with both MI and
PS. For that, you would need:

1. create multiple imputations using -ice- or the official -mi-.
2. write your own estimation program (say you named it -mi_ps_st-) that
2a. run logistic regression as a matter of propensity score modeling
2b. generate propensity scores
2c. run your survival model
2d. Ideally, you'd want to correct the standard errors in the survival
model for the fact that you have created some of the regressors. It is
possible to do that in the linear regression context (see Hardin
(2002,, but I
don't know if this approach is generalizable to -streg-.
3. run your -mi_ps_st- prefixed by -mim- (or, respectively, -mi
estimate-) to combine the estimates and standard errors. Remember that
MI only makes sense when you have the final parameter estimates and
their standard errors. The intermediate results, like specific
imputations, or observation-level averages across them, as you thought
initially for your propensity scores, may not be very meaningful.

The guide you referred to is dated, in the sense that Stata 12
incorporates MICE methodology in the official -mi-. The guide would
still be applicable to Stata 11. I also did not like it relying on the
author's written programs, although that is sometimes inevitable (I
tend to trust the stuff that underwent some minimal checks at SJ or
SSC a little bit better).

BTW, I don't think it is at all possible to get the right standard
errors from matching, so you would probably have to let that
methodology go, anyway. So you would have to look into other options
with your survival model.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index