Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Combining multiple imputation with propensity score matching

From   Austin Nichols <>
Subject   Re: st: Combining multiple imputation with propensity score matching
Date   Tue, 2 Mar 2010 12:50:02 -0500

David Kantor <> :

Swapping potential matches to optimize a sum of squared distances or
sum of distances is not "too complex" to do in Stata, though it would
no doubt be processor-intensive for a reasonable-sized problem, and
faster in Mata, and faster still in C.  In any case, I agree it is
probably not worth the effort to program it, though the case at hand
is a particularly simple one (matching on propensity only, not a

On Tue, Mar 2, 2010 at 12:24 PM, David Kantor <> wrote:
> Hi.
> As the author of mahapick, I would like to mention that, indeed, it does not
> pick unique matches. (This could be an avenue for future development.)
> You can specify that it generates a multitude of match candidates, which is
> virtually a queue, in order of closeness, of possible matches for each
> primary ("treated") case. You then can take this and run a loop that visits
> primary cases in a random order. For each such case,
>  select the best candidate for the given primary case;
>  remove that selected match as a candidate for use in later passes through
> the loop.
> I recommend that if you want more than one match (say 3) per primary case,
> that you run this loop several (3) times (maintaining the same data
> structure that disqualifies candidates from future matching) -- rather than
> selecting, say, the best 3 matches for each case in one pass through the
> loop. The latter method might enable earlier cases in the loop to grab
> better matches.
> Of course, this has a random element to the process. You may or may not like
> that. But you need some way of deciding who gets a given candidate if it is
> matched to more than one primary case.
> I had done this selection process once, several years ago; I might be able
> to dig up the code if necessary. My co-worker also had a plan to somehow
> optimize the process by swapping matches in order to minimize the sum of the
> distances. That was too complex to be done in Stata, and we abandoned it. I
> understand that the task was taken up by others (in C, I suppose), but the
> result was no better than the original random process.
> --David
> At 11:17 AM 3/2/2010, John E. Cornell wrote:
>> Dear Stata Folks:
>> I have a large, and somewhat complicated multi-site dataset, that requires
>> the use of multiple imputation to fill-in missing lab values that I need to
>> generate propensity scores for three classes of drugs. I used the new
>> multiple imputation procedure based on multivariate normal regression to
>> fill-in the missing lab values. We created 20 imputed datasets if the flong
>> format, and used logistic regression to compute and save the propensity
>> scores in logit form within each imputed set. We used mahapick to select to
>> match cases (being on one or more of the three agents) to controls (never on
>> any of the three agents). This worked well, but there are two problems we
>> encountered at this stage. First, the procedure selects the closest match
>> actual distance may be very large so we needed to edit the matches to
>> maintain a subset of cases with reasonable closeness. Second, the procedure
>> may match the same control to more than one case, so we needed to restrict
>> the sample to unique matches. Finally, the number of matches varied between
>> imputed sets.
>> It does not appear that the mi estimate command can handle this situation.
>> So, we are left with the prospect of writing our own code to compute and
>> combine the model estimates. We are relatively novice Stata programmers at
>> the moment, and we would welcome any suggestions, references, etc. that the
>> Stata community could provide that will help us solve this problem.
>> Cheers,
>> John E. Cornell, Ph.D.
>> Professor
>> Department of Epidemiology and Biostatistics
>> University of Texas Health Science Center, San Antonio
>> 7703 Floyd Curl Drive
>> San Antonio, Texas 78229-3900
>> [...]

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index