Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Statistical Matching


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Statistical Matching
Date   Tue, 19 Jul 2011 12:38:43 -0400

Gillette, Ryan (Volunteer) <Ryan_K_Gillette@omb.eop.gov>:
You can still use propensity scores, defining dataset 1 as T=0 and
dataset 2 as T=1 and e.g. running a logit of T on X in the appended
datasets.  Without more detail, it is hard to offer specific advice.
No user-written software is required, but there is much available to
download.  You can define a multivariate distance metric and get the
minimum-distance observations as matches, or you can do exact matching
by simply sorting appropriately, resampling with replacement the
appropriate number of times to achieve identical marginal
distributions, and then doing an unmatched -merge-.  This is
especially easy if you have weights in each dataset that sum to the
same population total.  N.B. the -sort- can be used to match on one
continuous variable by rank within categories of discrete variables.

On Tue, Jul 19, 2011 at 12:26 PM, Gillette, Ryan (Volunteer)
<Ryan_K_Gillette@omb.eop.gov> wrote:
> Hello,
>
> I am trying to match comparable observations between two large datasets (300,000 to 3 million observations, depending which ones I decide to use). I am not trying to calculate a treatment effect, but rather identify the id number or observation number of an observation's closest match. I am matching across a few variables, some of which I want to weight more than others in  terms of required precision. I don't think I will be able to use a propensity score, as it doesn't seem appropriate for my task.
>
> Does anyone know a program in Stata that can do these things? I have used -nnmatch- before, but with such a large dataset I worry it could take days to process. Is there a way to speed it up? Any ideas would be much appreciated!
>
> Thanks,
>
> Ryan

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index