Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Ariel Linden. DrPH" <ariel.linden@gmail.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
re: RE: st: propensity matching and matched pair analysis |

Date |
Tue, 21 Aug 2012 09:34:36 -0700 |

Claude, If I understand you correctly, reviewers suggested that you use statistics based on 1:1 matching and apply it when you, in fact, are using a 1:2 approach? I don't want to get in an argument with reviewers (unless it's my own paper), but there is no literature supporting that argument. This excerpt from page 13 in Stuart, 2010, (reference below) describes the issue perfectly.: 5.1 After k : 1 Matching When each treated individual has received k matches, the outcome analysis proceeds using the matched samples, as if those samples had been generated through randomization. There is debate about whether the analysis needs to account for the matched pair nature of the data (Austin, 2007). However, there are at least two reasons why it is not necessary to account for the matched pairs (Schafer and Kang, 2008; Stuart, 2008). First, conditioning on the variables that were used in the matching process (such as through a regression model) is sufficient. Second, propensity score matching, in fact, does not guarantee that the individual pairs will be well-matched on the full set of covariates, only that groups of individuals with similar propensity scores will have similar covariate distributions. Thus, it is more common to simply pool all the matches into matched treated and control groups and run analyses using the groups as a whole, rather than using the individual matched pairs. If you (or the reviewers) still insist that you non-parametric statistics must be used here, I suggest you consider -somersd- (a complete statistical suite for non-parametric statistics written by Roger Newson and found on ssc). For example, a simple command after you've preprocessed your data with matching would be: somersd treatment outcome, tr(z) If you want to account for the groupings of the data you could use the weights generated in psmatch2 that account for the number of controls for each treated (in the case of 1:2 matching, treated units will equal 1 and each control will equal 0.5). somersd treatment outcome [pw=_weight], tr(z) This is a simple case I am providing. -somersd- in fact allows for much more control of the data than this. For example you can cluster and stratify, etc.... Most importantly, you will not need to rearrange your data into a different format than it already is. Ariel REFERENCE: Elizabeth A. Stuart. Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science 2010, Vol. 25, No. 1, 1-21 Date: Mon, 20 Aug 2012 14:07:33 +0000 From: Claude Beaty <cbeaty1@jhmi.edu> Subject: RE: st: propensity matching and matched pair analysis Ariel, I apologize for the delay in responding to your email. I appreciate your opinion on the use of parametric statistics and would love to avoid this whole issue based on these reasons, but the article I am submitting based on this data is subject to peer review and the reviewers would like to accommodate for the pairing in my statistics (I initially utilized the t-test, fisher's exact and chi2 analysis in my paired data to describe the covariate relationships based on disease presence). As for my questions: 1) Can McNemar's test be used in a 2:1 analysis or only in a 1:1 analysis? If this is the wrong test, what would be a better analysis for a 2:1 match (Mantel-Haenszel etc)? Assuming this is the right test, I realize that these case-controls need to be matched and analyzed as a group for the McNemar's and sign rank tests. I also know the groupings, as you indicated and I previously mentioned, based on the _id and _nk variables created by the -psmatch2- command. Currently, my data set is 1800 observations (rows) of patient information. I can create a grouping variable with the 600 groups (3 patients per group as per the 2:1 matching) by hand but would prefer not to. 2) Is there an easy way to create these groups or do I have to write out 600 lines of code, as each group will have 3 different _id codes associated with it? Finally, my understanding of the -mcc- command for McNemar's test is that the individual numbers associated with the various boxes of the concordant table are to be entered by the user, not calculated by the program. I believe this means I need to manually interrogate every grouping by exposure to a variable of interest and then sum these results by hand, to enter them into a final McNemar's concordant table describing all of the groups relationships based on this one variable. This process would then have to be repeated for every variable of interest. As I have 600 groups and >40 variables of interest, this could prove to be prohibitively tedious. 3) Is this understanding of the command correct? If so, is there a way in which the program can calculate one discordant table for all groups but based on individual intra-group interactions? Thank you. Claude A. Beaty Jr., M.D. Halsted Surgical Resident Cardiac Surgery Research Fellow The Johns Hopkins Hospital - -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Ariel Linden. DrPH Sent: Saturday, August 18, 2012 12:41 PM To: statalist@hsphsun2.harvard.edu Subject: re: st: propensity matching and matched pair analysis Hi Claude, First off, as part of the Statalist requirements, listers are asked to note what program they are using and where it is found. -psmatch2- is a user written program and can be found on ssc (findit psmatch2). In regards to your query, you have a couple different things going on here. First, there is still an ongoing debate as to whether you need to run non-parametric statistics after matching. This issue is even less clear when you use 1:k matches, since matched individuals will likely differ on some covariates even though they appear to be balanced at the cohort level (so on average the groups will be comparable, but any two matched individuals may not be). This is exactly what happens in an RCT - the random assignment ensures balance on covariates at the aggregate level, but not necessarily between any two people. What this means is that you can use parametric statistics if they are more suitable to answer your particular research question. In fact, there is even an ongoing debate as to whether the researcher should weight the observations in 1:k matching when a fixed ratio matching is applied (which you did using a fixed 2 controls for every 1 treated). Had you used variable matching, weighting would have been necessary. The next issue you ask about is identifying the specific treated and matched controls. As per the help file, several new variables are generated: _id In the case of one-to-one and nearest-neighbors matching, a new identifier created for all observations. _nk In the case of one-to-one and nearest-neighbors matching, for every treatment observation, it stores the observation number of the k-th matched control observation. Do not forget to sort by _id if you want to use the observation number (id) of for example the 1st nearest neighbor as in sort _id g x_of_match = x[_n1] _nn In the case of nearest-neighbors matching, for every treatment observation, it stores the number of matched control observations. Thus, you can order the matches so that they fall into groups. I hope this helps Ariel Date: Fri, 17 Aug 2012 20:09:21 +0000 From: Claude Beaty <cbeaty1@jhmi.edu> Subject: st: propensity matching and matched pair analysis All, I have utilized the psmatch2 command to successfully create a 2:1 nearest neighbor match in my data based upon the presence of a specific disease. This has resulted in 630 cases and 1200 controls. I can identify the matches by the unique ID numbers created during the match process. In order to accurately analyze this data, I know that I need to run McNemar's test for categorical variables and the sign rank test for continuous variables to account for the pairing. However, currently the data are listed as 1800 different observations. Is there an easy was to group the cases with their controls, or due I have to manually create 630 groups by and then sum the results from each group to run these analyses? Claude A. Beaty Jr., M.D. Halsted Surgical Resident Cardiac Surgery Research Fellow The Johns Hopkins Hospital * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: RE: st: propensity matching and matched pair analysis***From:*Claude Beaty <cbeaty1@jhmi.edu>

- Prev by Date:
**st: Missing Observations. Do I need multiple Imputations?** - Next by Date:
**Re: st: Xaxis transformation after logging variable** - Previous by thread:
**RE: st: propensity matching and matched pair analysis** - Next by thread:
**RE: RE: st: propensity matching and matched pair analysis** - Index(es):