Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Matching in STATA


From   Henry <jakanyada@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Matching in STATA
Date   Mon, 23 Jun 2008 11:06:58 +0100

Dear Listers,
Salah, I am using secondary data from a primary care database. This
contains  patient details -medical, therapy and demoraphic
information. In this case, i guess i will have to do some matching.
Not all patients in the database have the outcome of interest and thus
in order to use classical logistic regression models to evaluate risk
of outcome with exposure allowing for possible conounders , we need to
find controls as close to the cases as much as possible. Several
variables are possible confounders but we have chosen age(at time of
outcome) and sex in order to get controls that are as close to the
cases as possible. Primarily, at this stage i just wanted to drop some
of the control patients that have no matching cases-to have a
managable dataset. I will then try to implement the suggestions given
by listers at the analysis stage.
Hope this sheds more light
Henry


On Mon, Jun 23, 2008 at 3:25 AM, Salah Mahmud <salah.mahmud@gmail.com> wrote:
> Hi Henry,
>
> As has been suggested before, I'm not sure that matching in the
> setting of pre-collected data is worth the trouble unless you intend
> to go back and collect more data on the sampled subset.
>
> The way I see it, matching, when employed during the design phase,
> permits control for important confounders while maximizing study
> power. Another use for matching is as a way for adjusting for
> confounders that are difficult to characterize and quantify (eg
> confouding by genetic makeup or neighbourhood). Neither scenarios seem
> to apply here but I could be missing something.
>
> Best wishes,
>
> salah mahmud
>
>
>
> On 6/22/08, Henry <jakanyada@gmail.com> wrote:
>> Many thanks for your suggestions, which will help me a great deal.
>> Svend, I am using a longitudinal primary care data extracted by Dec 2002.
>> A case-control study design has been used to identify an association
>> between an outcome (event) and an exposure.  I would like to control
>> for potential confounders (age and sex) and thus matching on these
>> variables.
>> A 2x2 table is my interest to obtain the Odds ratios. This should give
>> me the concordant and discordant pairs once the matching is done.
>> Conditional logistic regression will definitely be the way forward for
>> our analysis.
>> My problem can be solved by what listers suggested of using dummy
>> variables and then matching on age-groups excluding already matched
>> cases.
>> I guess this can be possible even when we extend to the 1:n matching
>> (though still not sure till I try it out)
>> Another important variable to introduce will be pair-ID for matched pairs.
>> I had looked at what Maarten suggested but wasn't sure how to
>> implement the packages and was also reluctant on the –sttocc- given
>> nature of my data but with the suggestions; I can have a second look.
>>
>> *************************************************************************************************************************************************
>> Dear Henry,
>> Guess I would generate a new dummy variable (for both data sets) for
>> the case where you want to merge by age-groups and then merge by this
>> new age-group.
>> Kind regards,
>> Andrea
>> *************************************************************
>> There are now quite a lot of packages available in this area, see:
>> -findit match treatment- (and try some other searches with -findit-)
>> -- Maarten
>> *************************************************************
>> I get the impression that data have already been collected, and that
>> the purpose of matching is to facilitate analysis (at the cost of
>> dropping some of the control observations). Actually, matching
>> complicates rather than facilitates analysis in case-control studies;
>> at least you need to use conditional logistic regression (or -mcc-) to
>> analyse correctly. So, if my impression is right, the recommendation
>> is to analyse with -logistic- (or -cc-) including the potential
>> confounders of interest, but avoiding to match and to remove any of
>> the control observations. A variable like age could be grouped, e.g.,
>> in five-year groups.
>> Anyway, if you want or need to match, the usual way is to categorize
>> a variable in, e.g., five year groups: 30-34, 35-39, etc. This is
>> more handy, and it also facilitates reporting the results (you can
>> stratify by age group).
>> Hope this helps
>> Svend
>> **************************************************************
>> On Fri, Jun 20, 2008 at 3:48 PM, Salah Mahmud <salah.mahmud@gmail.com> wrote:
>> > For completeness, also see    [ST] sttocc -- Convert survival-time
>> > data to case-control data
>> >
>> > sttocc automates the process of sampling matched controls. It is
>> > intended to generate nested case-control data from a cohort data but
>> > it should not be difficult to "fool" it into sampling from a
>> > cross-sectional data.
>> >
>> > You still need to create the grouped age variable as per above posts.
>> >
>> > In my experience, you will require several rounds of matching with
>> > increasingly permissive age grouping to find matches to all your cases
>> > unless you have lots of data and only 1 or 2 matching variables. This
>> > could be implemented within a for loop where each successive loop
>> > drops and then creates an age grouping variable that is slightly
>> > cruder than its predecessor.
>> >
>> > For instance,
>> > round                    age group variable
>> > 1                           agegroup = age  ("exact" matching)
>> > 2                           agegroup = age collapsed into 2 yrs intervals
>> > 3                           agegroup = age collapsed into 3 yrs
>> > intervals and so,
>> >
>> > Of course, you will need to exclude any matched cases (and perhaps
>> > controls) before merging the ummatched cases to the remaining
>> > controls.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jun 20, 2008 at 6:25 AM, Svend Juul <SJ@soci.au.dk> wrote:
>> >>
>> >> Henry wrote:
>> >>
>> >> I would like to carry out some matching for a case-control study using
>> >> STATA but its proving to be a bit challenging to me. I have checked
>> >> from achieves but a query close to mine on statlist was not answered
>> >> in 2004. Could there be a way of matching cases to controls within a
>> >> range of values say for age, a 40yr old case-patient can be matched to
>> >> either a 38 or 39 or 40 or 41 or 42yr old control-patient? I have used
>> >> the -merge- command to merge two datasets by sex and age of patients
>> >> but it only works for 40yr old case matching a 40yr old control. For
>> >> this case am still interested in a 1-1  matching but what if I extend
>> >> this to a 1:n match?  I want to have something of this sort:
>> >>
>> >> case-patient  case-age  sex  control-patient  control-age
>> >>        00b7        35    1             00YP           35
>> >>        00b7        35    1             0XC1           33
>> >>        00b7        35    1             0001           36
>> >>
>> >> ==================================================================
>> >>
>> >> I get the impression that data have already been collected, and that
>> >> the purpose of matching is to facilitate analysis (at the cost of
>> >> dropping some of the control observations). Actually, matching
>> >> complicates rather than facilitates analysis in case-control studies;
>> >> at least you need to use conditional logistic regression (or -mcc-) to
>> >> analyse correctly. So, if my impression is right, the recommendation
>> >> is to analyse with -logistic- (or -cc-) including the potential
>> >> confounders of interest, but avoiding to match and to remove any of
>> >> the control observations. A variable like age could be grouped, e.g.,
>> >> in five-year groups.
>> >>
>> >> Anyway, if you want or need to match, the usual way is to categorize
>> >> a variable in, e.g., five year groups: 30-34, 35-39, etc. This is
>> >> more handy, and it also facilitates reporting the results (you can
>> >> stratify by age group).
>> >>
>> >> Hope this helps
>> >> Svend
>> >>
>> >>
>> >> __________________________________________
>> >>
>> >> Svend Juul
>> >> Institut for Folkesundhed, Afdeling for Epidemiologi
>> >> (Institute of Public Health, Department of Epidemiology)
>> >> Vennelyst Boulevard 6
>> >> DK-8000  Aarhus C, Denmark
>> >> Phone:  +45 8942 6090
>> >> Home:   +45 8693 7796
>> >> Email:  sj@soci.au.dk
>> >> __________________________________________
>> >>
>> >> *
>> >> *   For searches and help try:
>> >> *   http://www.stata.com/support/faqs/res/findit.html
>> >> *   http://www.stata.com/support/statalist/faq
>> >> *   http://www.ats.ucla.edu/stat/stata/
>> >>
>> > *
>> > *   For searches and help try:
>> > *   http://www.stata.com/support/faqs/res/findit.html
>> > *   http://www.stata.com/support/statalist/faq
>> > *   http://www.ats.ucla.edu/stat/stata/
>> >
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/support/faqs/res/findit.html
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index