Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Ariel Linden, DrPH" <ariel.linden@gmail.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
re: Re: st: Statistical Matching |

Date |
Wed, 20 Jul 2011 14:21:01 -0700 |

I agree with Austin (always!) that there is no reason why the propensity score could not be used here. In fact, it probably makes more sense when you have such a huge N. I would suggest you look at -cem- (a user-written program by Matt Blackwell and Gary King at Harvard). That program will allow you to match on several variables or on the propensity score - your choice. I am not sure how well it will perform in such a large dataset though. Ariel Date: Tue, 19 Jul 2011 12:38:43 -0400 From: Austin Nichols <austinnichols@gmail.com> Subject: Re: st: Statistical Matching Gillette, Ryan (Volunteer) <Ryan_K_Gillette@omb.eop.gov>: You can still use propensity scores, defining dataset 1 as T=0 and dataset 2 as T=1 and e.g. running a logit of T on X in the appended datasets. Without more detail, it is hard to offer specific advice. No user-written software is required, but there is much available to download. You can define a multivariate distance metric and get the minimum-distance observations as matches, or you can do exact matching by simply sorting appropriately, resampling with replacement the appropriate number of times to achieve identical marginal distributions, and then doing an unmatched -merge-. This is especially easy if you have weights in each dataset that sum to the same population total. N.B. the -sort- can be used to match on one continuous variable by rank within categories of discrete variables. On Tue, Jul 19, 2011 at 12:26 PM, Gillette, Ryan (Volunteer) <Ryan_K_Gillette@omb.eop.gov> wrote: > Hello, > > I am trying to match comparable observations between two large datasets (300,000 to 3 million observations, depending which ones I decide to use). I am not trying to calculate a treatment effect, but rather identify the id number or observation number of an observation's closest match. I am matching across a few variables, some of which I want to weight more than others in terms of required precision. I don't think I will be able to use a propensity score, as it doesn't seem appropriate for my task. > > Does anyone know a program in Stata that can do these things? I have used -nnmatch- before, but with such a large dataset I worry it could take days to process. Is there a way to speed it up? Any ideas would be much appreciated! > > Thanks, > > Ryan * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: summarizing data over 3 or 5 year periods for macro panel data.** - Next by Date:
**re: Re: st: Paired t-test for propensity match cohort** - Previous by thread:
**Re: st: Statistical Matching** - Next by thread:
**st: Date: Tue, 19 Jul 2011 17:47:23 +0000** - Index(es):