Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: st: propensity score (diff group sizes: treatment >>> control)

From   "Ariel Linden" <>
To   <>
Subject   re: st: propensity score (diff group sizes: treatment >>> control)
Date   Wed, 19 Feb 2014 10:04:17 -0500

First off, it is expected on the Statalist that you state which program you
are using and where you got them from. In this case, I am assuming from the
command line that you provided that you are using -pscore- and -psmatch2-
(both are user written programs (findit pscore and findit psmatch2).

Second, you have only 40 controls to match against 500 treated. This
basically leaves you with two choices: 

(1) match with replacement. The potential problem here is that you likely
use some (or all) these controls so many times that I'd question the
generalizability of the results (can one control really serve as a
counterfactual for 100 treated individuals. It seems not to be a very good
strategy). In any case, if you go this route you'll need to use a frequency
weight to account for the number of times that each control was used.

(2) a perhaps more reasonable approach would be to flip the matching so that
you're matching treated to controls. In other words, find the 40+ treated
units that are most comparable on observed characteristics to those 40
controls. This will change your treatment effects estimator to ATC (average
treatment effects on the controls).

In both cases above, I would suggest that you stick with a matching
algorithm as opposed to kernel density matching. It will be easier to
visually inspect the matches to see if it passes the "sniff test".

I hope this helps

Date: Tue, 18 Feb 2014 17:04:03 +0000
From: <>
Subject: st: propensity score (diff group sizes: treatment >>> control)

Dear list,

I am evaluating an intervention for with I have a control group (N=40) and a
treatment group (N=500).
I am using propensity score matching to match the two groups by
sociodemographics (age, gender, living status).
I am considering two methods and I would be delighted to receive advices
from anybody having encountered the same problem.

1/ First method
I calculated the pscore, and then performed the PSM (Kernel method).

pscore Group Age Gender Living, pscore(myscore) blockid(myblock)

psmatch2 Group, outcome(GP_Times) pscore(myscore) kernel(normal)

2/ Second method
Due to the high difference in the observations and the fact that my
treatment is now the more numerous, I tried to inverse the groups.
I created a new variable ('Group_opposite') with control group (N=500) and
treatment group (N=40), then I calculated the new pscore, and finally I
performed the PSM (Kernel method).

pscore Group Age Gender Living, pscore(myscore2) blockid(myblock2)

psmatch2 Group_opposite, outcome(GP_Times) pscore(myscore2) kernel(normal)

The values calculated using the second pscore seems to be more conservative
that the first ones, and the results more acceptable.
Would be right to use the second method instead of the first one?

Looking forward for your advices, many thanks in advance,

Best wishes,


Valentina Iemmi | Research Officer 
London School of Economics and Political Science | Personal Social Services
Research Unit - PSSRU
Houghton Street | London WC2A 2AE 

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index