Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Matching samples in Stata


From   Paula Arce <paulaarce@rocketmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Matching samples in Stata
Date   Thu, 11 Oct 2012 18:40:17 +0100 (BST)

HI David, 

I finally got round to matching my sample.  I match the two samples on family education level and gender

mahapick ed_level_fam sex, idvar( "ID") genfile(D:\matched) nummatches(4) full treated(course)

where course is 1 for medicine and 0 for other - as in my analyses I want to compare medicine students vs. the others.  I created a file 'matched' as I intend to import the relevant variables into it so that I can just run the analyses for this.

Ideally I want to only keep the first match.

However, when I check for duplicates using 

duplicates list ID

I find that many of the matched respondents are the same for different medicine students. 

Can you suggest what I am doing wrong and any way around this pls?

Thanks,
Paula

----- Original Message -----
From: David Kantor <kantor.d@att.net>
To: statalist@hsphsun2.harvard.edu
Cc: 
Sent: Wednesday, 3 October 2012, 16:29
Subject: Re: st: Matching samples in Stata

Hello Paula,

At 07:29 AM 10/3/2012, you wrote:
> Thanks David,
> 
> mahapick is very user-friendly; what's the main difference between mahapick and psmatch2? or are they pretty much equivalent?
> [...]

I actually have never used psmatch2 or psmatch, though I have tried to read through one or the other on some occasions (and borrowed a bit).
I don't really know much about what it does, but my impression is that, in comparison to mahapick, it...
a, has several different options and constraints for the distance measure, in addition to Mahalanobis;
b, can do a selection of unique matches using a randomized selection order;
c, can perform various analyses on the resulting matching -- whereas mahapick just gets you the matching.

I believe that if you specify psmatch2 with a mahalanobis distance, you should get the same distance measure as you would in mahapick.

In my own usage of mahapick, I had sometimes done a randomized selection, but in a subsequent separate procedure (that I have not made into a publishable program).

Thanks for saying that mahapick is user-friendly. I often worry that there are too many options to keep track of -- including one that is a vestige of its first incarnation, which I would not advise using.

It may be helpful to know that the mahapick suite has several other programs for just obtaining the distance measure:
        mahascore: generates the distance between every observation and one specific point or observation;
        mahascores: generates the distance between every pair of observations (or possibly a limited set of pairs);
        mahascore2: computes the (single) distance between two specified points or the centroids of specified populations.

HTH
--David

*
*   For searches and help try:
*  http://www.stata.com/help.cgi?searchhttp://www.stata.com/support/faqs/resources/statalist-faq/http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index