Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Generating a matched pair sample for a case-control study


From   "Lacy,Michael" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Generating a matched pair sample for a case-control study
Date   Sun, 9 Dec 2012 20:45:05 +0000

On Fri, 7 Dec 2012,  Sacamano, Paul L." <[email protected] wrote:

>I need to generate a sample of controls matching the age frequency distribution of the cases.
>
>Matching will be 1 (case) to 2 (controls). There are a total 63 cases that have already been randomly
>selected, and I need to match them with 126 controls from a pool of subjects. Cases have a 30-day
>hospital readmission, controls do not. I currently have all the cases in a Stata file. The pool for
>selecting matched controls is an Excel file that I can easily copy and paste into Stata.
>
>Is there a Stata command to generate a sample of matched pairs based on the age frequency distribution
>for cases that have already been randomly selected?
>
>Thanks for the help, Paul *
>

For a single attribute, frequency matching and pair matching are not distinguishable, right?


The following takes a file of controls and pair-matches them 2:1 by single
year of age with individuals in a file of 63 cases. It's possible there will not be
enough controls at a given age to match each case, which the following example 
data instantiates, and which the code detects.

clear
// mock up control data 
set seed 846
set obs 500  // don't know how many controls you have
gen byte case = 0
gen byte age = 20 +ceil(65*runiform())  // broad age range assumed
tempfile controls
sort age 
save `controls'
clear
// mock up cases
set obs 63
gen byte case = 1
gen byte age = 20 +ceil(65*runiform())
//
// The real stuff starts here; you have an existing control file you can append to your cases.
append using `controls'
gen rand = runiform()
sort age case rand
by age: egen ncases = sum(case)
keep if (ncases >=1) // age groups with no cases are irrelevant
//
// The following keeps the first 2 controls  for each case within each age group
by age: keep if (case ==1) | ((_n <= 2*ncases) & (case == 0))
tab2 age case
by age: egen ncontrols = sum(case == 0)
count if (ncontrols < 2*ncases)

Regards,

Mike Lacy
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784


Mike Lacy
Assoc. Prof./Dir. Grad. Studies
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784
970.491.6721 (voice)

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index