Re: st: One to N matching

From   "Austin Nichols" <>
Subject   Re: st: One to N matching
Date   Tue, 18 Nov 2008 14:08:17 -0500

Gao Liu--
Note that -vmatch- does not select k nearest neighbors without
replacement, though it will find all matches within caliper (i.e. all
obs j with p_j no more than r away from a particular observation's
value p_i), which is not guaranteed to get you any closer to the goal.
 -psmatch2- will select k nearest neighbors, but only with
replacement, and it only saves the identifier of the first matched
observation. Probably the best thing for you to do is to clone
-psmatch2- into a new file as -mypsmatch2- and modify the Mata code to
save additional identifiers in a Stata matrix.  But you can also loop
over observations and match the hard way.  It is unclear to me why you
would ever want to do this; matching k obs without replacement makes
the calculation of standard errors much harder, and the bootstrap is
not an option with matching.  And you have to decide what to do about

Here's a quickly cobbled together version of matching by hand by
looping over observations; no warranty expressed or implied that it
will be appropriate or easy to adapt for your application...

use, clear
g case=educ>16
qui logit case exper* smsa south
predict p
set seed 123
g double u=uniform()
sort case u
g _id=_n
g z=case
loc n=4
forv j=1/`n' {
 g match`j'=.
 g p`j'=.
count if case==1
forv i=1/`r(N)' {
 g diff=abs(p-p[`i'])
 sort z diff
 qui forv j=1/`n' {
  loc match`j'=_id[`j']
  loc p`j'=p[`j']
  replace z=. in `j'
 drop diff
 sort _id
 qui forv j=1/`n' {
  replace match`j'=`match`j'' in `i'
  replace p`j'=`p`j'' in `i'
li _id p* match* in 1/15, noo clean
li _id p* match* in 351/360, noo clean

Let me just repeat--I think this is a bad idea, in the sense that I
cannot think of a reason to do this as opposed to using -psmatch2- or
-nnmatch- (also on SSC) or reweighting.  See also on reweighting.

On Tue, Nov 18, 2008 at 1:24 PM, Richard Goldstein
<> wrote:
> -vmatch- does provide 1 to N because it finds all matches for each case; in
> a recent match that I did I found anywhere from 1 to 18 matches for each of
> my cases
> Rich
> Gao Liu wrote:
>> Thank you, Richard and Autstin,
>> The score in my dataset is actually from psmatch2. Although psmatch2
>> provides 1 to n matching, but it does not indicate which non-case
>> observations are matched to the case obs, except for the nearest one.
>> Also it allows duplicated matching. That is why I plan to do it by
>> hand.
>> I just checked the command, vmatch. I did not find that it provdes the
>> option of 1 to N matching.
>> Austin, can you decribe some more how to do it by hand.
>> best
>> Gao
>> On Tue, Nov 18, 2008 at 11:32 AM, Austin Nichols
>> <> wrote:
>>> Gao Liu--
>>> Actually, I think you are looking for -psmatch2- (findit psmatch2).
>>> Or did you want to program the matching by hand?  That is also
>>> possible, and not very hard in the case where all you want is the
>>> nearest N matches.  However, note that the order of matching will
>>> matter in the situation you describe--matching without replacement--so
>>> you should probably do the matching many times and compute statistics
>>> using the rules of variance computation for multiple imputation.
>>> On Tue, Nov 18, 2008 at 11:07 AM, Richard Goldstein
>>> <> wrote:
>>>> for already existing programs, rather than writing your own, I would
>>>> start
>>>> with -vmatch- (user-written, type -findit vmatch-)
>>>> I'm not sure it will cover your last criterion (used only once) but if
>>>> not
>>>> it should be easy to eliminate those
>>>> Rich
>>>> Gao Liu wrote:
>>>>> Dear Statlist:
>>>>> I have a question about one to N matching.
>>>>> I have a dataset containing three variables: id, score, case, where
>>>>> case is a dummy variable indicating whether or not the observation is
>>>>> in the case group. How can I match each case observation to N non-case
>>>>> observation based on the score?  Each case observation matches to the
>>>>> N non-case observations with the closest scores, but no case
>>>>> observation can match the same observation (i.e. the non-case
>>>>> observation can be used only one time).
>>>>> Thank you
>>>>> Best
>>>>> Gao
