Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: One to N matching


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: One to N matching
Date   Tue, 18 Nov 2008 14:08:17 -0500

Gao Liu--
Note that -vmatch- does not select k nearest neighbors without
replacement, though it will find all matches within caliper (i.e. all
obs j with p_j no more than r away from a particular observation's
value p_i), which is not guaranteed to get you any closer to the goal.
 -psmatch2- will select k nearest neighbors, but only with
replacement, and it only saves the identifier of the first matched
observation. Probably the best thing for you to do is to clone
-psmatch2- into a new file as -mypsmatch2- and modify the Mata code to
save additional identifiers in a Stata matrix.  But you can also loop
over observations and match the hard way.  It is unclear to me why you
would ever want to do this; matching k obs without replacement makes
the calculation of standard errors much harder, and the bootstrap is
not an option with matching.  And you have to decide what to do about
ties...

Here's a quickly cobbled together version of matching by hand by
looping over observations; no warranty expressed or implied that it
will be appropriate or easy to adapt for your application...

use http://pped.org/stata/card, clear
g case=educ>16
qui logit case exper* smsa south
predict p
set seed 123
g double u=uniform()
sort case u
g _id=_n
g z=case
loc n=4
forv j=1/`n' {
 g match`j'=.
 g p`j'=.
 }
count if case==1
forv i=1/`r(N)' {
 g diff=abs(p-p[`i'])
 sort z diff
 qui forv j=1/`n' {
  loc match`j'=_id[`j']
  loc p`j'=p[`j']
  replace z=. in `j'
  }
 drop diff
 sort _id
 qui forv j=1/`n' {
  replace match`j'=`match`j'' in `i'
  replace p`j'=`p`j'' in `i'
  }
 }
li _id p* match* in 1/15, noo clean
li _id p* match* in 351/360, noo clean

Let me just repeat--I think this is a bad idea, in the sense that I
cannot think of a reason to do this as opposed to using -psmatch2- or
-nnmatch- (also on SSC) or reweighting.  See also
http://pped.org/stata/erratum.pdf on reweighting.


On Tue, Nov 18, 2008 at 1:24 PM, Richard Goldstein
<richgold@ix.netcom.com> wrote:
> -vmatch- does provide 1 to N because it finds all matches for each case; in
> a recent match that I did I found anywhere from 1 to 18 matches for each of
> my cases
>
> Rich
>
> Gao Liu wrote:
>>
>> Thank you, Richard and Autstin,
>>
>> The score in my dataset is actually from psmatch2. Although psmatch2
>> provides 1 to n matching, but it does not indicate which non-case
>> observations are matched to the case obs, except for the nearest one.
>> Also it allows duplicated matching. That is why I plan to do it by
>> hand.
>>
>> I just checked the command, vmatch. I did not find that it provdes the
>> option of 1 to N matching.
>>
>> Austin, can you decribe some more how to do it by hand.
>>
>> best
>>
>> Gao
>>
>> On Tue, Nov 18, 2008 at 11:32 AM, Austin Nichols
>> <austinnichols@gmail.com> wrote:
>>>
>>> Gao Liu--
>>> Actually, I think you are looking for -psmatch2- (findit psmatch2).
>>> Or did you want to program the matching by hand?  That is also
>>> possible, and not very hard in the case where all you want is the
>>> nearest N matches.  However, note that the order of matching will
>>> matter in the situation you describe--matching without replacement--so
>>> you should probably do the matching many times and compute statistics
>>> using the rules of variance computation for multiple imputation.
>>>
>>> On Tue, Nov 18, 2008 at 11:07 AM, Richard Goldstein
>>> <richgold@ix.netcom.com> wrote:
>>>>
>>>> for already existing programs, rather than writing your own, I would
>>>> start
>>>> with -vmatch- (user-written, type -findit vmatch-)
>>>>
>>>> I'm not sure it will cover your last criterion (used only once) but if
>>>> not
>>>> it should be easy to eliminate those
>>>>
>>>> Rich
>>>>
>>>> Gao Liu wrote:
>>>>>
>>>>> Dear Statlist:
>>>>>
>>>>> I have a question about one to N matching.
>>>>>
>>>>> I have a dataset containing three variables: id, score, case, where
>>>>> case is a dummy variable indicating whether or not the observation is
>>>>> in the case group. How can I match each case observation to N non-case
>>>>> observation based on the score?  Each case observation matches to the
>>>>> N non-case observations with the closest scores, but no case
>>>>> observation can match the same observation (i.e. the non-case
>>>>> observation can be used only one time).
>>>>>
>>>>> Thank you
>>>>>
>>>>> Best
>>>>>
>>>>> Gao
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index