Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: Re: st: CEM speed up.


From   "Ariel Linden. DrPH" <ariel.linden@gmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   re: Re: st: CEM speed up.
Date   Sun, 30 Dec 2012 12:55:03 -1000

To clarify, Austin's re-weighting code is an alternative approach to CEM,
and given your data,  probably a better approach to consider.

If you choose to stay within -CEM- (findit cem), you will continue to have a
slow go at it since you've got so many observations. You didn't specify why
you chose to parcel mileage into those specific categories, but that may
slow things down a bit as well. 

If at the end of the day all you want is the weights, I think you are better
off using Austin's logic to generate the weights and then run your outcome
model with those weights:

Y = x [pw=w], robust

Ariel

Date: Sat, 29 Dec 2012 15:11:30 -0500
From: Austin Nichols <austinnichols@gmail.com>
Subject: Re: st: CEM speed up.

Hemang <Hemang.Subramanian@scheller.gatech.edu> :
The logic is, divide conditioning variables into categories and then
reweight within categories by the nonparametric propensity score:
egen c=cut(mileage), at(25000,50000,75000,100000,150000)
egen p=mean(treated), by(c)
egen w=cond(treated,1/p,1/(1-p))

On Fri, Dec 28, 2012 at 7:07 PM, Subramanian, Hemang
<Hemang.Subramanian@scheller.gatech.edu> wrote:
> Hi Stata-list,
>
>
>  I need help with speeding up the execution of CEM. I'm running coarsened
exact matching on about 1.8 million (rows) records. The command used is as
follows:
> My machine is a significantly powerful 32 Gig - 4 QuadCore processor
server.
> I need the CEM generated weights to run my regressions to validate the
effect of treatment on my dependent variable.
> My command is follows:
> cem matchid(#0) mileage(25000 50000 75000 100000 150000) , tr(treated)
>
> matchid  is  a generated ID with about 1,50,000 values and is a
stratification variable.
> mileage is a discrete variable with a set of values from 0 to 900,000.
> treated is the treatment indicator with a value - ( 1 or 0)
> I am trying to  obtain matches within each bucket  (ie. matchid ) or
create strata within each matchid.
> The weights generated by Cem will further be used to deduce the causal
effect.
> I tried using the noimb which suppresses the L1 vector distance
calculations and it does help with smaller data.
> Can anyone suggest alternate ways to speeden up the command's execution
or ways by which I could split up the above query or point me to the logic
that does the weight calculation in CEM?.
>
> warm regards.
> Hemang C Subramanian

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index