Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: st: Standardized difference of means after PS matching

From   "Ariel Linden. DrPH" <>
To   <>
Subject   re: st: Standardized difference of means after PS matching
Date   Wed, 29 Aug 2012 14:12:22 -0700

Hi Adam,

It is not clear to me what step in your process is giving you different
results? In which program are you using c.race vs i.race? I am not sure that
-pbalchk- (user written program by Mark Lunt) accepts the prefix -c.- and
furthermore, I don't understand why you'd treat a multiple categorical
variable (such as race) as continuous to begin with? You'd certainly end up
with a result that would be meaningless.

As far as calculating balance on a binary variable (assuming it is binary),
your results should not differ much (between treating the variable as
continuous and eliciting a proportion, or treating it as a count and using
chi2) if you have sufficient sample sizes (see what happens when you compare
chi2 with t-test for proportions)... However, if you have a multiple
categorical variable, then I believe you'd need to create dummies for use in

In any case, I can't really provide more guidance, since I not sure exactly
what is going on given the limited information you provided.


Date: Wed, 29 Aug 2012 00:15:15 -0400
From: Adam Olszewski <>
Subject: st: Standardized difference of means after PS matching

I noted something surprising today and I was wondering if any
stata-listers have some insight into this issue.
I have been using the (SSC derived user program) -psmatch2- for
propensity score matching and (also SSC-derived) -pbalchk- for
evaluation of balance after matching using standardized differences of
means (SDM).
I noted that the balance (sometimes quite dramatically!) improves if I
replace in the PS logistic model categorical variables with a 'plain'
non-factorized version (e.g. use "c.race" rather than "i.race" as a
variable). This of course makes no sense as a rational "real-world"
use of a variable, however the goal of PS analysis is to achieve the
balance and not to make a sensible predictive model.
I wonder however if this improvement could be an artifact of how the
SDM's are calculated by the -pbalchk- command (of course I calculate
categorized proportion differences using the -xi- syntax, ie.
"i.race"). I would not like to exploit a mathematical quirk to have
"better" (?) results, but on the other hand I can't find anything
particularly conceptually wrong with it.
Would any psmatch2/pbalchk users disagree?
Best regards,
Adam Olszewski

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index