Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: RE: st: propensity matching and matched pair analysis

From   "Ariel Linden. DrPH" <>
To   <>
Subject   re: RE: st: propensity matching and matched pair analysis
Date   Tue, 21 Aug 2012 09:34:36 -0700


If I understand you correctly, reviewers suggested that you use statistics
based on 1:1 matching and apply it when you, in fact, are using a 1:2
approach? I don't want to get in an argument with reviewers (unless it's my
own paper), but there is no literature supporting that argument.

This excerpt from page 13 in Stuart, 2010, (reference below) describes the
issue perfectly.:

5.1 After k : 1 Matching

When each treated individual has received k matches, the outcome analysis
proceeds using the matched samples, as if those samples had been generated
through randomization. There is debate about whether the analysis needs to
account for the matched pair nature of the data (Austin, 2007). However,
there are at least two reasons why it is not necessary to account for the
matched pairs (Schafer and Kang, 2008; Stuart, 2008). First, conditioning on
the variables that were used in the matching process (such as through a
regression model) is sufficient. Second, propensity score matching, in fact,
does not guarantee that the individual pairs will be well-matched on the
full set of covariates, only that groups of individuals with similar
propensity scores will have similar covariate distributions. Thus, it is
more common to simply pool all the matches into matched treated and control
groups and run analyses using the groups as a whole, rather than using the
individual matched pairs.

If you (or the reviewers) still insist that you non-parametric statistics
must be used here, I suggest you consider -somersd- (a complete statistical
suite for non-parametric statistics written by Roger Newson and found on

For example, a simple command after you've preprocessed your data with
matching would be:

somersd treatment outcome, tr(z) 

If you want to account for the groupings of the data you could use the
weights generated in psmatch2 that account for the number of controls for
each treated (in the case of 1:2 matching, treated units will equal 1 and
each control will equal 0.5).

somersd treatment outcome [pw=_weight], tr(z)

This is a simple case I am providing. -somersd- in fact allows for much more
control of the data than this. For example you can cluster and stratify,

Most importantly, you will not need to rearrange your data into a different
format than it already is.


REFERENCE: Elizabeth A. Stuart. Matching Methods for Causal Inference: A
Review and a Look Forward. Statistical Science  2010, Vol. 25, No. 1, 1-21

Date: Mon, 20 Aug 2012 14:07:33 +0000
From: Claude Beaty <>
Subject: RE: st: propensity matching and matched pair analysis


I apologize for the delay in responding to your email. I appreciate your
opinion on the use of parametric statistics and would love to avoid this
whole issue based on these reasons, but the article I am submitting based on
this data is subject to peer review and the reviewers would like to
accommodate for the pairing in my statistics (I initially utilized the
t-test, fisher's exact and chi2 analysis in my paired data to describe the
covariate relationships based on disease presence). 

As for my questions: 
1) Can McNemar's test be used in a 2:1 analysis or only in a 1:1 analysis?
If this is the wrong test, what would be a better analysis for a 2:1 match
(Mantel-Haenszel etc)? 

Assuming this is the right test, I realize that these case-controls need to
be matched and analyzed as a group for the McNemar's and sign rank tests. I
also know the groupings, as you indicated and I previously mentioned, based
on the _id and _nk variables created by the -psmatch2- command. Currently,
my data set is 1800 observations (rows) of patient information. I can create
a grouping variable with the 600 groups (3 patients per group as per the 2:1
matching) by hand but would prefer not to. 

2) Is there an easy way to create these groups or do I have to write out 600
lines of code, as each group will have 3 different _id codes associated with

Finally, my understanding of the -mcc- command for McNemar's test is that
the individual numbers associated with the various boxes of the  concordant
table are to be entered by the user, not calculated by the program. I
believe this means I need to manually interrogate every grouping by exposure
to a variable of interest and then sum these results by hand, to enter them
into a final McNemar's concordant table describing all of the groups
relationships based on this one variable. This process would then have to be
repeated for every variable of interest. As I have 600 groups and >40
variables of interest, this could prove to be prohibitively tedious.

3) Is this understanding of the command correct? If so, is there a way in
which the program can calculate one discordant table for all groups but
based on individual intra-group interactions?

Thank you.

Claude A. Beaty Jr., M.D.
Halsted Surgical Resident
Cardiac Surgery Research Fellow
The Johns Hopkins Hospital

- -----Original Message-----
[] On Behalf Of Ariel Linden.
Sent: Saturday, August 18, 2012 12:41 PM
Subject: re: st: propensity matching and matched pair analysis

Hi Claude,

First off, as part of the Statalist requirements, listers are asked to note
what program they are using and where it is found. -psmatch2- is a user
written program and can be found on ssc (findit psmatch2).

In regards to your query, you have a couple different things going on here.
First, there is still an ongoing debate as to whether you need to run
non-parametric statistics after matching. This issue is even less clear when
you use 1:k matches, since matched individuals will likely differ on some
covariates even though they appear to be balanced at the cohort level (so on
average the groups will be comparable, but any two matched individuals may
not be). This is exactly what happens in an RCT - the random assignment
ensures balance on covariates at the aggregate level, but not necessarily
between any two people. 

What this means is that you can use parametric statistics if they are more
suitable to answer your particular research question. In fact, there is even
an ongoing debate as to whether the researcher should weight the
observations in 1:k matching when a fixed ratio matching is applied (which
you did using a fixed 2 controls for every 1 treated). Had you used variable
matching, weighting would have been necessary.

The next issue you ask about is identifying the specific treated and matched
controls. As per the help file, several new variables are generated:

        _id In the case of one-to-one and nearest-neighbors matching, a new
        identifier created for all observations.

        _nk In the case of one-to-one and nearest-neighbors matching, for
        treatment observation, it stores the observation number of the k-th
        matched control observation. Do not forget to sort by _id if you
        to use the observation number (id) of for example the 1st nearest
        neighbor as in

        sort _id
        g x_of_match = x[_n1]

        _nn In the case of nearest-neighbors matching, for every treatment
        observation, it stores the number of matched control observations.

Thus, you can order the matches so that they fall into groups.

I hope this helps


Date: Fri, 17 Aug 2012 20:09:21 +0000
From: Claude Beaty <>
Subject: st: propensity matching and matched pair analysis


I have utilized the psmatch2 command to successfully create a 2:1 nearest
neighbor match in my data based upon the presence of a specific disease.
This has resulted in 630 cases and 1200 controls. I can identify the matches
by the unique ID numbers created during the match process. In order to
accurately analyze this data, I know that I need to run McNemar's test for
categorical variables and the sign rank test for continuous variables to
account for the pairing. However, currently the data are listed as 1800
different observations. Is there an easy was to group the cases with their
controls, or due I have to manually create 630 groups by and then sum the
results from each group to run these analyses?

Claude A. Beaty Jr., M.D.
Halsted Surgical Resident
Cardiac Surgery Research Fellow
The Johns Hopkins Hospital

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index