[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Austin Nichols" <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: One to N matching |

Date |
Tue, 18 Nov 2008 14:08:17 -0500 |

Gao Liu-- Note that -vmatch- does not select k nearest neighbors without replacement, though it will find all matches within caliper (i.e. all obs j with p_j no more than r away from a particular observation's value p_i), which is not guaranteed to get you any closer to the goal. -psmatch2- will select k nearest neighbors, but only with replacement, and it only saves the identifier of the first matched observation. Probably the best thing for you to do is to clone -psmatch2- into a new file as -mypsmatch2- and modify the Mata code to save additional identifiers in a Stata matrix. But you can also loop over observations and match the hard way. It is unclear to me why you would ever want to do this; matching k obs without replacement makes the calculation of standard errors much harder, and the bootstrap is not an option with matching. And you have to decide what to do about ties... Here's a quickly cobbled together version of matching by hand by looping over observations; no warranty expressed or implied that it will be appropriate or easy to adapt for your application... use http://pped.org/stata/card, clear g case=educ>16 qui logit case exper* smsa south predict p set seed 123 g double u=uniform() sort case u g _id=_n g z=case loc n=4 forv j=1/`n' { g match`j'=. g p`j'=. } count if case==1 forv i=1/`r(N)' { g diff=abs(p-p[`i']) sort z diff qui forv j=1/`n' { loc match`j'=_id[`j'] loc p`j'=p[`j'] replace z=. in `j' } drop diff sort _id qui forv j=1/`n' { replace match`j'=`match`j'' in `i' replace p`j'=`p`j'' in `i' } } li _id p* match* in 1/15, noo clean li _id p* match* in 351/360, noo clean Let me just repeat--I think this is a bad idea, in the sense that I cannot think of a reason to do this as opposed to using -psmatch2- or -nnmatch- (also on SSC) or reweighting. See also http://pped.org/stata/erratum.pdf on reweighting. On Tue, Nov 18, 2008 at 1:24 PM, Richard Goldstein <richgold@ix.netcom.com> wrote: > -vmatch- does provide 1 to N because it finds all matches for each case; in > a recent match that I did I found anywhere from 1 to 18 matches for each of > my cases > > Rich > > Gao Liu wrote: >> >> Thank you, Richard and Autstin, >> >> The score in my dataset is actually from psmatch2. Although psmatch2 >> provides 1 to n matching, but it does not indicate which non-case >> observations are matched to the case obs, except for the nearest one. >> Also it allows duplicated matching. That is why I plan to do it by >> hand. >> >> I just checked the command, vmatch. I did not find that it provdes the >> option of 1 to N matching. >> >> Austin, can you decribe some more how to do it by hand. >> >> best >> >> Gao >> >> On Tue, Nov 18, 2008 at 11:32 AM, Austin Nichols >> <austinnichols@gmail.com> wrote: >>> >>> Gao Liu-- >>> Actually, I think you are looking for -psmatch2- (findit psmatch2). >>> Or did you want to program the matching by hand? That is also >>> possible, and not very hard in the case where all you want is the >>> nearest N matches. However, note that the order of matching will >>> matter in the situation you describe--matching without replacement--so >>> you should probably do the matching many times and compute statistics >>> using the rules of variance computation for multiple imputation. >>> >>> On Tue, Nov 18, 2008 at 11:07 AM, Richard Goldstein >>> <richgold@ix.netcom.com> wrote: >>>> >>>> for already existing programs, rather than writing your own, I would >>>> start >>>> with -vmatch- (user-written, type -findit vmatch-) >>>> >>>> I'm not sure it will cover your last criterion (used only once) but if >>>> not >>>> it should be easy to eliminate those >>>> >>>> Rich >>>> >>>> Gao Liu wrote: >>>>> >>>>> Dear Statlist: >>>>> >>>>> I have a question about one to N matching. >>>>> >>>>> I have a dataset containing three variables: id, score, case, where >>>>> case is a dummy variable indicating whether or not the observation is >>>>> in the case group. How can I match each case observation to N non-case >>>>> observation based on the score? Each case observation matches to the >>>>> N non-case observations with the closest scores, but no case >>>>> observation can match the same observation (i.e. the non-case >>>>> observation can be used only one time). >>>>> >>>>> Thank you >>>>> >>>>> Best >>>>> >>>>> Gao * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: One to N matching***From:*"Gao Liu" <gao.liu@gmail.com>

**References**:**st: One to N matching***From:*"Gao Liu" <gao.liu@gmail.com>

**Re: st: One to N matching***From:*Richard Goldstein <richgold@ix.netcom.com>

**Re: st: One to N matching***From:*"Austin Nichols" <austinnichols@gmail.com>

**Re: st: One to N matching***From:*"Gao Liu" <gao.liu@gmail.com>

**Re: st: One to N matching***From:*Richard Goldstein <richgold@ix.netcom.com>

- Prev by Date:
**st: -graph twoway- and x-axis positioning** - Next by Date:
**RE: st: One to N matching** - Previous by thread:
**Re: st: One to N matching** - Next by thread:
**Re: st: One to N matching** - Index(es):

© Copyright 1996–2021 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |