Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Lacy,Michael" <Michael.Lacy@colostate.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Generating a matched pair sample for a case-control study |
Date | Sun, 9 Dec 2012 20:45:05 +0000 |
On Fri, 7 Dec 2012, Sacamano, Paul L." <psacaman@jhsph.edu wrote: >I need to generate a sample of controls matching the age frequency distribution of the cases. > >Matching will be 1 (case) to 2 (controls). There are a total 63 cases that have already been randomly >selected, and I need to match them with 126 controls from a pool of subjects. Cases have a 30-day >hospital readmission, controls do not. I currently have all the cases in a Stata file. The pool for >selecting matched controls is an Excel file that I can easily copy and paste into Stata. > >Is there a Stata command to generate a sample of matched pairs based on the age frequency distribution >for cases that have already been randomly selected? > >Thanks for the help, Paul * > For a single attribute, frequency matching and pair matching are not distinguishable, right? The following takes a file of controls and pair-matches them 2:1 by single year of age with individuals in a file of 63 cases. It's possible there will not be enough controls at a given age to match each case, which the following example data instantiates, and which the code detects. clear // mock up control data set seed 846 set obs 500 // don't know how many controls you have gen byte case = 0 gen byte age = 20 +ceil(65*runiform()) // broad age range assumed tempfile controls sort age save `controls' clear // mock up cases set obs 63 gen byte case = 1 gen byte age = 20 +ceil(65*runiform()) // // The real stuff starts here; you have an existing control file you can append to your cases. append using `controls' gen rand = runiform() sort age case rand by age: egen ncases = sum(case) keep if (ncases >=1) // age groups with no cases are irrelevant // // The following keeps the first 2 controls for each case within each age group by age: keep if (case ==1) | ((_n <= 2*ncases) & (case == 0)) tab2 age case by age: egen ncontrols = sum(case == 0) count if (ncontrols < 2*ncases) Regards, Mike Lacy Dept. of Sociology Colorado State University Fort Collins CO 80523-1784 Mike Lacy Assoc. Prof./Dir. Grad. Studies Dept. of Sociology Colorado State University Fort Collins CO 80523-1784 970.491.6721 (voice) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/