Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Ariel Linden" <ariel.linden@gmail.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | re: Re: st: RE: Propensity score balancing property |
Date | Wed, 1 Jan 2014 10:50:59 -0500 |
While I agree with Adam about the technical steps involved in ps-matching (or weighting) strategies, there is an even more fundamental problem here - a theoretical and content basis for the process. In a randomized controlled trial, one hopes that all baseline observed and unobserved characteristics are equivalent between treatment and control groups, and that any imbalances are due to chance. With observational data, we have no such luxury and thus, we need to try and achieve balance on as many characteristics as we can get our hands on (e.g. observed data) and "hope" that any unmeasured confounding will not be sufficiently large as to bias the outcomes. Orrin is running through the data, eliminating whatever variables that don't appear to be balanced between treatment and control groups. This does not eliminate bias, it moves it into the "out of sight, out of mind" category. In other words, not only is there the implicit risk of unobserved confounding, additional bias is likely due to the unmeasured (but available) confounding. I am particularly concerned when I see that there are 14,000 untreated observations and 800 treated observations and yet suitable matches cannot be found based on the relatively small number of covariates. Without seeing the data, I would expect that this means that the treated group is wildly different than the untreated group on baseline characteristics. This further suggests to me that if even a subset of controls cannot be adequately matched to these treated subjects (or even a subset of the treated subjects), that the extrapolation needed to "bridge the gap" in those differences will render the outcome analysis useless (or at the very least, ungeneralizable). Orrin, I suggest that you consider these issues first and foremost, and spend some time investigating why these groups are so different on baseline, and why you cannot achieve balance, even when you theoretically have 17.5 untreated subjects to match to every treated unit. I hope this helps Ariel Date: Tue, 31 Dec 2013 15:43:14 -0500 From: Adam Olszewski <adam.olszewski@gmail.com> Subject: Re: st: RE: Propensity score balancing property I believe that -pscore- performs the propensity score analysis using the stratification method. It will estimate the score and then subdivide the population into "blocks" (typically 5 quintiles). It will then require that balance of covariates be achieved within each stratum. Unless you have a large number of observations, this may not be achieveable at all, although the p-value of 0.01 is actually quite relaxed (and you could lower it even further, but it is picky). It is very difficult to adjust the PS model with this method, because you have to keep track of balance in each stratum separately. This is altogether not a very popular way of doing the propensity score matching nowadays. There are various matching and weighting methods that may be more attractive, although they have their own weaknesses: matching may discard a number of observations unless performed carefully, and weighting may produce unrealistic, distorted pseudopopulation if the PS model is misspecified or there are major outliers. The -pstest- command can accomodate an alternative test of balance that is not stratified, which will likely get rid of your problem. You should however use the balance check that is appropriate for your PS methodology. In any case, as widely discussed in the relevant literature, balance tests that rely on sample-size dependent statistics (t-tests, chi2-tests etc.) are not really the best approach. Using standardized differences of means and proportions (see e.g. the user-written -pbalchk- command) and particularly a thorough assessment of cumulative distributions may be more appropriate, even though it requires more work then just running a series of t-tests. I hope that might help you design your study better. Best, AO On Tue, Dec 31, 2013 at 3:19 PM, Joe Canner <jcanner1@jhmi.edu> wrote: > Orrin, > > I have had this same problem with -pscore-. My gut feeling is that it is using an overly-conservative definition of "balance", although I'm not sure how to prove such an accusation. In any case, you may want to try -psmatch2- (also available from SSC) which seems to have a more realistic view of balance. > > Regards, > Joe Canner > Johns Hopkins University School of Medicine > ________________________________________ > From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Orrin Pail [orrinpail@gmail.com] > Sent: Tuesday, December 31, 2013 2:46 PM > To: statalist@hsphsun2.harvard.edu > Subject: st: Propensity score balancing property > > Hello, > > I am trying to use pscore for propensity matching analysis and I am > having difficulties in satisfying the balancing property. > > My covariates (xlist below) include 8 different variables. I have been > trying different combinations of them, sometimes removing some, adding > others. I also tried including interaction terms into my xlist but > stata says that they are not allowed. No matter what I do, my > balancing property is not being met. > > The closest I get to satisfying the balancing property is when I use > five of the variables. In this case, the output says that the final > number of blocks is 8 and that three of the variables (they are listed > separately) are not balanced in block 7. > > Does anyone have any recommendations on how I can satisfy the > balancing property besides adding more variables? From what I > understand, my propensity score analysis would be useless without the > balancing property being met, so I would appreciate all the help! > > The command I am using is below (I defined treatment and xlist prior > to this command): > > pscore $treatment $xlist, pscore(myscore) blockid(myblock) comsup > > Thanks! > > Orrin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/