Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Combining multiple imputation with propensity score matching

From   David Kantor <>
Subject   Re: st: Combining multiple imputation with propensity score matching
Date   Tue, 02 Mar 2010 12:24:12 -0500

As the author of mahapick, I would like to mention that, indeed, it does not pick unique matches. (This could be an avenue for future development.) You can specify that it generates a multitude of match candidates, which is virtually a queue, in order of closeness, of possible matches for each primary ("treated") case. You then can take this and run a loop that visits primary cases in a random order. For each such case,
 select the best candidate for the given primary case;
remove that selected match as a candidate for use in later passes through the loop.

I recommend that if you want more than one match (say 3) per primary case, that you run this loop several (3) times (maintaining the same data structure that disqualifies candidates from future matching) -- rather than selecting, say, the best 3 matches for each case in one pass through the loop. The latter method might enable earlier cases in the loop to grab better matches.

Of course, this has a random element to the process. You may or may not like that. But you need some way of deciding who gets a given candidate if it is matched to more than one primary case.

I had done this selection process once, several years ago; I might be able to dig up the code if necessary. My co-worker also had a plan to somehow optimize the process by swapping matches in order to minimize the sum of the distances. That was too complex to be done in Stata, and we abandoned it. I understand that the task was taken up by others (in C, I suppose), but the result was no better than the original random process.


At 11:17 AM 3/2/2010, John E. Cornell wrote:

Dear Stata Folks:

I have a large, and somewhat complicated multi-site dataset, that requires the use of multiple imputation to fill-in missing lab values that I need to generate propensity scores for three classes of drugs. I used the new multiple imputation procedure based on multivariate normal regression to fill-in the missing lab values. We created 20 imputed datasets if the flong format, and used logistic regression to compute and save the propensity scores in logit form within each imputed set. We used mahapick to select to match cases (being on one or more of the three agents) to controls (never on any of the three agents). This worked well, but there are two problems we encountered at this stage. First, the procedure selects the closest match actual distance may be very large so we needed to edit the matches to maintain a subset of cases with reasonable closeness. Second, the procedure may match the same control to more than one case, so we needed to restrict the sample to unique matches. Finally, the number of matches varied between imputed sets.

It does not appear that the mi estimate command can handle this situation. So, we are left with the prospect of writing our own code to compute and combine the model estimates. We are relatively novice Stata programmers at the moment, and we would welcome any suggestions, references, etc. that the Stata community could provide that will help us solve this problem.


John E. Cornell, Ph.D.
Department of Epidemiology and Biostatistics
University of Texas Health Science Center, San Antonio
7703 Floyd Curl Drive
San Antonio, Texas 78229-3900

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index