Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Ariel Linden. DrPH" <ariel.linden@gmail.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | re: Re: st: CEM speed up. |
Date | Sun, 30 Dec 2012 12:55:03 -1000 |
To clarify, Austin's re-weighting code is an alternative approach to CEM, and given your data, probably a better approach to consider. If you choose to stay within -CEM- (findit cem), you will continue to have a slow go at it since you've got so many observations. You didn't specify why you chose to parcel mileage into those specific categories, but that may slow things down a bit as well. If at the end of the day all you want is the weights, I think you are better off using Austin's logic to generate the weights and then run your outcome model with those weights: Y = x [pw=w], robust Ariel Date: Sat, 29 Dec 2012 15:11:30 -0500 From: Austin Nichols <austinnichols@gmail.com> Subject: Re: st: CEM speed up. Hemang <Hemang.Subramanian@scheller.gatech.edu> : The logic is, divide conditioning variables into categories and then reweight within categories by the nonparametric propensity score: egen c=cut(mileage), at(25000,50000,75000,100000,150000) egen p=mean(treated), by(c) egen w=cond(treated,1/p,1/(1-p)) On Fri, Dec 28, 2012 at 7:07 PM, Subramanian, Hemang <Hemang.Subramanian@scheller.gatech.edu> wrote: > Hi Stata-list, > > > I need help with speeding up the execution of CEM. I'm running coarsened exact matching on about 1.8 million (rows) records. The command used is as follows: > My machine is a significantly powerful 32 Gig - 4 QuadCore processor server. > I need the CEM generated weights to run my regressions to validate the effect of treatment on my dependent variable. > My command is follows: > cem matchid(#0) mileage(25000 50000 75000 100000 150000) , tr(treated) > > matchid is a generated ID with about 1,50,000 values and is a stratification variable. > mileage is a discrete variable with a set of values from 0 to 900,000. > treated is the treatment indicator with a value - ( 1 or 0) > I am trying to obtain matches within each bucket (ie. matchid ) or create strata within each matchid. > The weights generated by Cem will further be used to deduce the causal effect. > I tried using the noimb which suppresses the L1 vector distance calculations and it does help with smaller data. > Can anyone suggest alternate ways to speeden up the command's execution or ways by which I could split up the above query or point me to the logic that does the weight calculation in CEM?. > > warm regards. > Hemang C Subramanian * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/