Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: One to N matching


From   "Gao Liu" <[email protected]>
To   [email protected]
Subject   Re: st: One to N matching
Date   Tue, 18 Nov 2008 14:13:51 -0500

Thanks, Austin,

I'll check out your do file. Thanks

Best

Gao

On Tue, Nov 18, 2008 at 2:08 PM, Austin Nichols <[email protected]> wrote:
> Gao Liu--
> Note that -vmatch- does not select k nearest neighbors without
> replacement, though it will find all matches within caliper (i.e. all
> obs j with p_j no more than r away from a particular observation's
> value p_i), which is not guaranteed to get you any closer to the goal.
>  -psmatch2- will select k nearest neighbors, but only with
> replacement, and it only saves the identifier of the first matched
> observation. Probably the best thing for you to do is to clone
> -psmatch2- into a new file as -mypsmatch2- and modify the Mata code to
> save additional identifiers in a Stata matrix.  But you can also loop
> over observations and match the hard way.  It is unclear to me why you
> would ever want to do this; matching k obs without replacement makes
> the calculation of standard errors much harder, and the bootstrap is
> not an option with matching.  And you have to decide what to do about
> ties...
>
> Here's a quickly cobbled together version of matching by hand by
> looping over observations; no warranty expressed or implied that it
> will be appropriate or easy to adapt for your application...
>
> use http://pped.org/stata/card, clear
> g case=educ>16
> qui logit case exper* smsa south
> predict p
> set seed 123
> g double u=uniform()
> sort case u
> g _id=_n
> g z=case
> loc n=4
> forv j=1/`n' {
>  g match`j'=.
>  g p`j'=.
>  }
> count if case==1
> forv i=1/`r(N)' {
>  g diff=abs(p-p[`i'])
>  sort z diff
>  qui forv j=1/`n' {
>  loc match`j'=_id[`j']
>  loc p`j'=p[`j']
>  replace z=. in `j'
>  }
>  drop diff
>  sort _id
>  qui forv j=1/`n' {
>  replace match`j'=`match`j'' in `i'
>  replace p`j'=`p`j'' in `i'
>  }
>  }
> li _id p* match* in 1/15, noo clean
> li _id p* match* in 351/360, noo clean
>
> Let me just repeat--I think this is a bad idea, in the sense that I
> cannot think of a reason to do this as opposed to using -psmatch2- or
> -nnmatch- (also on SSC) or reweighting.  See also
> http://pped.org/stata/erratum.pdf on reweighting.
>
>
> On Tue, Nov 18, 2008 at 1:24 PM, Richard Goldstein
> <[email protected]> wrote:
>> -vmatch- does provide 1 to N because it finds all matches for each case; in
>> a recent match that I did I found anywhere from 1 to 18 matches for each of
>> my cases
>>
>> Rich
>>
>> Gao Liu wrote:
>>>
>>> Thank you, Richard and Autstin,
>>>
>>> The score in my dataset is actually from psmatch2. Although psmatch2
>>> provides 1 to n matching, but it does not indicate which non-case
>>> observations are matched to the case obs, except for the nearest one.
>>> Also it allows duplicated matching. That is why I plan to do it by
>>> hand.
>>>
>>> I just checked the command, vmatch. I did not find that it provdes the
>>> option of 1 to N matching.
>>>
>>> Austin, can you decribe some more how to do it by hand.
>>>
>>> best
>>>
>>> Gao
>>>
>>> On Tue, Nov 18, 2008 at 11:32 AM, Austin Nichols
>>> <[email protected]> wrote:
>>>>
>>>> Gao Liu--
>>>> Actually, I think you are looking for -psmatch2- (findit psmatch2).
>>>> Or did you want to program the matching by hand?  That is also
>>>> possible, and not very hard in the case where all you want is the
>>>> nearest N matches.  However, note that the order of matching will
>>>> matter in the situation you describe--matching without replacement--so
>>>> you should probably do the matching many times and compute statistics
>>>> using the rules of variance computation for multiple imputation.
>>>>
>>>> On Tue, Nov 18, 2008 at 11:07 AM, Richard Goldstein
>>>> <[email protected]> wrote:
>>>>>
>>>>> for already existing programs, rather than writing your own, I would
>>>>> start
>>>>> with -vmatch- (user-written, type -findit vmatch-)
>>>>>
>>>>> I'm not sure it will cover your last criterion (used only once) but if
>>>>> not
>>>>> it should be easy to eliminate those
>>>>>
>>>>> Rich
>>>>>
>>>>> Gao Liu wrote:
>>>>>>
>>>>>> Dear Statlist:
>>>>>>
>>>>>> I have a question about one to N matching.
>>>>>>
>>>>>> I have a dataset containing three variables: id, score, case, where
>>>>>> case is a dummy variable indicating whether or not the observation is
>>>>>> in the case group. How can I match each case observation to N non-case
>>>>>> observation based on the score?  Each case observation matches to the
>>>>>> N non-case observations with the closest scores, but no case
>>>>>> observation can match the same observation (i.e. the non-case
>>>>>> observation can be used only one time).
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Gao
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index