[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From   David Airey <>
Subject   Re: st: Re: RE: CASE-CONTROL STUDY
Date   Sun, 15 Mar 2009 12:12:57 -0500


There is at that URL, optmatch2.ado and optmatch2.hlp.

You can save these files where Stata wants them, and then they will be available to you at the Stata command line. There won't be any menus.

You might also track don't the author. He has offered email assistance using this program.


help for optmatch2

Optimal Matching


        optmatch2 casecontrol varlist [if] [in] [, options ]

    options               description
      minc(#)              Minimum control:case ratio.
      maxc(#)              Maximum number of controls per set.
nc(#) Total number of controls to include in match. gen(newvar) A new variable to contain the number of the case-control
                            set each subject belongs to.
      caliper(#)           Limit on acceptable matching.
      measure(string)      Type of dissimilarity measure to use.
      epsilon(#)           Stability constant.
repeat If requested number of controls cannot be matched, produce
                            match with as many controls as possible.


The command optmatch2 performs optimal matching using the network flow methodology outlined in Rosenbaum(1989). The variable casecontrol contains 1 for cases and 0 for controls. The variable(s) on which matching is to be performed are given by varlist. If there is more than one variable in varlist, there are a number of ways of calculating a distance between a case and a control: see the option measure
    below for more information.

Options        +------+
----+ Main +---------------------------------------------------------------

minc(#) Minimum control:case ratio. May be less than 1: e.g. 0.5 means the same
        control can be mapped to 2 cases. Default value is 1.
maxc(#) Maximum number of controls per case-control set. Must be an integer >= 1:
        default value is 1
nc(#) Total number of controls to be used in the match. Defaults to the all controls in the dataset. Can be set to any integer less than or equal to this: requesting more controls than exist in the dataset will cause optmatch2 to
        fail with an error message.
gen(newvar) If given, this will create a new variable containing an identifier for the case-control set this individual belongs to. If it is not given, a variable called set is created, unless it already exists in which case
        optmatch2 will fail with an error message.
caliper(#) This sets the maximum allowable discrepancy between a case and a control within a matched set. By default, no caliper is set and every control
        can, in theory, be matched to any case.
measure(string) This is only of importance if there are more than one variable in varlist. In this case, it determines the metric to use when converting differences in several variables to one overall dissimilarity measure. The standard measures that stata can use are outline in measure_option. Of these, optmatch2 can use L(#), Lpower(#) and Linfinity and their various aliases, with the default being L2. In addition, it can accept a value mahal to use the
        Mahalonobis distance.
epsilon(#) Default value is 0.000001. Technically, the optimal matching method only works if all discrepancies between cases and controls are greater than zero. This value is added to all discrepancies to ensure that this is the case. The value of epsilon can affect the matching if (opt minc} < 1: see
        Hansen and Klopfer (2006) for a discussion of this.
repeat It may be impossible for optmatch2 to find a matching that matches the requested number of controls (nc). This may be a logical impossibility (there are not that many controls in the data) or an empirical one (if you use caliper to define the maximum allowable discrepancy in a match, it may not be possible to match all controls to a case). If you give the repeat option, it will report how many controls it can match, then perform the matching with that number of controls. Otherwise, it will simply report the maximum number
        of controls it could match.


The command optmatch2 produces matched sets, that is groups consisting of one or more cases and one or more controls, with the dissimilarities between subjects in a set being as small as possible. By default, it produces matched pairs (1 case and 1 control), but this can be changed using the options minc, maxc and nc. For example, minc(1) maxc(1) will produce the default 1 to 1 matching, whilst minc(3) maxc(3) will produce sets which all consist of 1 case and 3 controls.

More complex matchings can be achieved by using values of minc less than 1. For example, minc(0.5) maxc(2) will produce sets consisting of either 1 control and 2
    cases, 1 control and 1 case or 2 controls and 1 case.


Ben B. Hansen and Stephanie Olsen Klopfer: "Optimal Full Matching and Related Designs via Network Flows" (2006) Journal of Computational and Graphical
        Statistics 15(3):  609-627.

Paul R. Rosenbaum "Optimal Matching for Observational Studies" (1989) JASA
        84(408): 1024-1302.


    Mark Lunt, ARC Epidemiology Unit

    The University of Manchester

Please email if you encounter problems with this

On Mar 15, 2009, at 11:52 AM, Ishay Barat wrote:

Dear Kieran and David

There can be lots of arguments why one designs backwards study and not forward one. In my case, I am responsible for a lot of patients, going through my department, and need form time to time to have quality control of our patients management. It would have been nice to have a quarter of a million $ and 3 years time to carry on a study, but that's not reality.


And now for your question.

As my objective is geriatric patients, and my data includes general inter medicine ward cliental I like to reduce the noise younger and far healthier patients introduces into my data.

By matching some crucial parameters like age, sex, medication and disease, I may get answers to my questions.

As to my anagram. It is just for fun and nothing else.

As to the reference to

I installed the files, but can not find the command in the menus.

*¸..· ´¨)) -:¦:-        *
  ¸.·´ .
(( -:¦:- * Ishay *  -:¦:-
  ´·..          ..·´
             ((¸¸.·´* -:¦:-


Matching is an element of the design of a study, planned before the data is collected, and should be done for efficiency, not control. If you already have the data, you gain nothing by matching. You have a sample size of 2,500. If you match these data in the way you have indicated, you will end up with a matched sample size of 1,200. Why would you want to discard over half of your data?

You should analyse the data as they are and control for age, sex, etc in the analysis.

Kieran McCaul MPH PhD
WA Centre for Health & Ageing (M573)
University of Western Australia
Level 6, Ainslie House
48 Murray St
Perth 6000
Phone: (08) 9224-2140
Fax: (08) 9224 8009
Epidemiology is so beautiful and provides such an important perspective on human life and death, but an incredible amount of rubbish is published. Richard Peto (2007)

-----Original Message-----
From: [ ] On Behalf Of Ishay Barat
Sent: Sunday, 15 March 2009 1:28 AM


I've got a data set containing about 2500 patients, of which 300 have my
interest (Group A).

I would like to extract a sample of 900 patients (Group B) out of the data set that match Group A in age, sex and some other parameters. A Classical
Case-Control study with 3 controllers for each case.

Is anybody have a clue how the syntax look like??

*¸..· ´¨)) -:¦:-        *
  ¸.·´ .
(( -:¦:- * Ishay *  -:¦:-
  ´·..          ..·´
             ((¸¸.·´* -:¦:-

*   For searches and help try:

*   For searches and help try:

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.557 / Virus Database: 270.11.13 - Release Date: 13-03-2009 00:00

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index