Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables? |

Date |
Fri, 13 Jul 2012 11:12:40 -0400 |

Ariel and Pete-- Estimating a logit with dummies is one way to combine across distinct combinations of the 15 observables to estimate a propensity score. A fully nonparametric propensity score would include every possible interaction as well, or simply compute the mean of treatment across all cells (possibly millions of cells). If any cells have pscore 0 or 1, and some are almost certain to be degenerate in that way, then you must combine that cell with another; one way of doing that is using the marginal across some subset of categories. The logit with no interactions is one particular method of combining across cells. sysuse auto logit foreign i.rep78 predict p if e(sample) egen m=mean(foreign), by(rep78) su m p if p<. * Note that if you do not restrict using if e(sample) * the estimated p=.818 for rep78=1 * (taken from excl cat rep78=5) when it should be zero. ta rep78, mi sum(foreign) ta rep78, mi sum(m) ta rep78, mi sum(p) g fakecat=round(mpg,10) logit foreign i.rep78##i.fakecat predict p2 if e(sample) egen m2=mean(foreign), by(rep78 fakecat) su m2 p2 if p2<. On Fri, Jul 13, 2012 at 10:19 AM, Ariel Linden, DrPH <ariel.linden@gmail.com> wrote: > Hi Pete, > > Since estimation of the propensity score is nothing more than a logistic (or > probit) regression model, you could leave the categorical variables as-is > and use the "i." prefix to denote that they are categorical, such as i.race. > The regression output will show you that the levels of the categorical > variable have been dealt with accordingly (including if any of the levels > are dropped from the model). See for example: > > sysuse auto > logit foreign i.rep78 > > On the other hand, you could certainly create dummy variables for the > categorical variable. However, if you have a large number of covariates, > your dataset will start looking ugly in a hurry. In any case, your results > will be identical: > > tab rep78, gen(rep78_) > logit foreign rep78_1- rep78_5 > > I hope this helps > > Ariel > > Date: Fri, 13 Jul 2012 10:06:14 +0700 > From: TA Stat <tastat@gmail.com> > Subject: st: Propensity Score Matching with Multiple Categorical Variables > with Multiple Categories...Dummy Variables? > > Dear All > > In PS matching, I am wondering about how to handle multiple > categorical variables e.g. 15 variables. Each variable has multiple > categories e.g. 3-5 categories. Do I have to create dummy variables, > (n-1 for each variable), for all those categorical variables before > calculating propensity score? > > Thanks > Pete * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?***From:*TA Stat <tastat@gmail.com>

**References**:**Re:st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?***From:*"Ariel Linden, DrPH" <ariel.linden@gmail.com>

- Prev by Date:
**RE: st: Lag variables - generate missing values** - Next by Date:
**RE: st: separate text boxes in each panel of a "by"** - Previous by thread:
**Re:st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?** - Next by thread:
**Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?** - Index(es):