Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?


From   TA Stat <tastat@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?
Date   Sat, 14 Jul 2012 12:51:19 +0700

Thanks everyone for advice.  I am figuring out how to collapse some
categories of each variable in a meaningful way for my research
question.  I will keep my eyes on additional advice from everyone.

Pete

On Fri, Jul 13, 2012 at 10:12 PM, Austin Nichols
<austinnichols@gmail.com> wrote:
> Ariel and Pete--
> Estimating a logit with dummies is one way to combine across distinct
> combinations of the 15 observables to estimate a propensity score. A
> fully nonparametric propensity score would include every possible
> interaction as well, or simply compute the mean of treatment across
> all cells (possibly millions of cells).  If any cells have pscore 0 or
> 1, and some are almost certain to be degenerate in that way, then you
> must combine that cell with another; one way of doing that is using
> the marginal across some subset of categories. The logit with no
> interactions is one particular method of combining across cells.
>
> sysuse auto
> logit foreign i.rep78
> predict p if e(sample)
> egen m=mean(foreign), by(rep78)
> su m p if p<.
> * Note that if you do not restrict using if e(sample)
> * the estimated p=.818 for rep78=1
> * (taken from excl cat rep78=5) when it should be zero.
> ta rep78, mi sum(foreign)
> ta rep78, mi sum(m)
> ta rep78, mi sum(p)
>
> g fakecat=round(mpg,10)
> logit foreign i.rep78##i.fakecat
> predict p2 if e(sample)
> egen m2=mean(foreign), by(rep78 fakecat)
> su m2 p2 if p2<.
>
>
> On Fri, Jul 13, 2012 at 10:19 AM, Ariel Linden, DrPH
> <ariel.linden@gmail.com> wrote:
>> Hi Pete,
>>
>> Since estimation of the propensity score is nothing more than a logistic (or
>> probit) regression model, you could leave the categorical variables as-is
>> and use the "i." prefix to denote that they are categorical, such as i.race.
>> The regression output will show you that the levels of the categorical
>> variable have been dealt with accordingly (including if any of the levels
>> are dropped from the model). See for example:
>>
>> sysuse auto
>> logit foreign i.rep78
>>
>> On the other hand, you could certainly create dummy variables for the
>> categorical variable. However, if you have a large number of covariates,
>> your dataset will start looking ugly in a hurry. In any case, your results
>> will be identical:
>>
>> tab rep78, gen(rep78_)
>> logit foreign rep78_1- rep78_5
>>
>> I hope this helps
>>
>> Ariel
>>
>> Date: Fri, 13 Jul 2012 10:06:14 +0700
>> From: TA Stat <tastat@gmail.com>
>> Subject: st: Propensity Score Matching with Multiple Categorical Variables
>> with Multiple Categories...Dummy Variables?
>>
>> Dear All
>>
>> In PS matching, I am wondering about how to handle multiple
>> categorical variables e.g. 15 variables.  Each variable has multiple
>> categories e.g. 3-5 categories.  Do I have to create dummy variables,
>> (n-1 for each variable), for all those categorical variables before
>> calculating propensity score?
>>
>> Thanks
>> Pete
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index