[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Matching in STATA

From   "Salah Mahmud" <>
Subject   Re: st: Matching in STATA
Date   Fri, 20 Jun 2008 09:48:57 -0500

For completeness, also see    [ST] sttocc -- Convert survival-time
data to case-control data

sttocc automates the process of sampling matched controls. It is
intended to generate nested case-control data from a cohort data but
it should not be difficult to "fool" it into sampling from a
cross-sectional data.

You still need to create the grouped age variable as per above posts.

In my experience, you will require several rounds of matching with
increasingly permissive age grouping to find matches to all your cases
unless you have lots of data and only 1 or 2 matching variables. This
could be implemented within a for loop where each successive loop
drops and then creates an age grouping variable that is slightly
cruder than its predecessor.

For instance,
round                    age group variable
1                           agegroup = age  ("exact" matching)
2                           agegroup = age collapsed into 2 yrs intervals
3                           agegroup = age collapsed into 3 yrs
intervals and so,

Of course, you will need to exclude any matched cases (and perhaps
controls) before merging the ummatched cases to the remaining

On Fri, Jun 20, 2008 at 6:25 AM, Svend Juul <> wrote:
> Henry wrote:
> I would like to carry out some matching for a case-control study using
> STATA but its proving to be a bit challenging to me. I have checked
> from achieves but a query close to mine on statlist was not answered
> in 2004. Could there be a way of matching cases to controls within a
> range of values say for age, a 40yr old case-patient can be matched to
> either a 38 or 39 or 40 or 41 or 42yr old control-patient? I have used
> the -merge- command to merge two datasets by sex and age of patients
> but it only works for 40yr old case matching a 40yr old control. For
> this case am still interested in a 1-1  matching but what if I extend
> this to a 1:n match?  I want to have something of this sort:
> case-patient  case-age  sex  control-patient  control-age
>        00b7        35    1             00YP           35
>        00b7        35    1             0XC1           33
>        00b7        35    1             0001           36
> ==================================================================
> I get the impression that data have already been collected, and that
> the purpose of matching is to facilitate analysis (at the cost of
> dropping some of the control observations). Actually, matching
> complicates rather than facilitates analysis in case-control studies;
> at least you need to use conditional logistic regression (or -mcc-) to
> analyse correctly. So, if my impression is right, the recommendation
> is to analyse with -logistic- (or -cc-) including the potential
> confounders of interest, but avoiding to match and to remove any of
> the control observations. A variable like age could be grouped, e.g.,
> in five-year groups.
> Anyway, if you want or need to match, the usual way is to categorize
> a variable in, e.g., five year groups: 30-34, 35-39, etc. This is
> more handy, and it also facilitates reporting the results (you can
> stratify by age group).
> Hope this helps
> Svend
> __________________________________________
> Svend Juul
> Institut for Folkesundhed, Afdeling for Epidemiologi
> (Institute of Public Health, Department of Epidemiology)
> Vennelyst Boulevard 6
> DK-8000  Aarhus C, Denmark
> Phone:  +45 8942 6090
> Home:   +45 8693 7796
> Email:
> __________________________________________
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index