Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: identifying age-matched controls in a cohort study


From   Phil Clayton <[email protected]>
To   [email protected]
Subject   Re: st: identifying age-matched controls in a cohort study
Date   Fri, 23 Aug 2013 17:40:41 +1000

There are different approaches. I use -psmatch2- (SSC) because it's quite convenient and fast. It's also trivial to extend your matching to use a propensity score rather than a single variable.

You don't need to calculate the age at exposure - you can just match on age at the start of the study (or even date of birth).

If someone was exposed part-way through the study, do you want to allow them to be a non-exposed control for someone who was exposed earlier?

If not, you could use an iterative loop to match exposed patients until they've all been matched to a living unexposed control. Here is an example. It's probably not the most efficient way of doing it but it still doesn't take too long.

Phil

* example cohort data
* 1000 exposed and 20000 unexposed people
* study period is 01jan2005 to 31dec2009
clear
set obs 21000
gen id=_n

* random date of exposure during the study period for first 100 people
set seed 12345
gen expdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=1000
format %td expdate

* in this simulation we'll assume that the exposed patients are 1.2x as likely
* to die as the unexposed
* we'll make 20% of the unexposed die, so 240 of the exposed patients will get a
* death date - but if it's before their expdate we'll not count it
gen deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=240
replace deathdate=. if deathdate<=expdate

* random death date for 20% of the other 20000 people
* (assume the rest are alive at the end of 2009)
replace deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if inrange(id, 1001, 5000)
format %td deathdate

* age at start of study
gen age=rnormal(50, 5)

**** we have now finished constructing our dummy dataset ****

* now we classify patients as ever-exposed vs never-exposed
gen byte exposed=!missing(expdate)

* we need to know who's available for each match run
* at the start everyone is available
gen byte available=1

* pair is a new variable indicating each matched pair
* initially it's missing for everyone
gen pair=.

* now iteratively match until everyone's been matched to 1 neighbour
local finished=0
while `finished'==0 {
	* randomly sort the data before running -psmatch- (in case of ties)
	gen double rsort=runiform()
	sort rsort
	drop rsort
	
	psmatch2 exposed if available, pscore(age) n(1) noreplace
	
	* -psmatch2- creates an ID variable for everyone called _id
	* for matched treated patients, the untreated match's ID is stored in
	* the treated patient's _n1 variable
	* anyone matched has a _weight of 1
	sort _id
	
	* the pair becomes the control's ID for exposed cases
	* and then these cases are no longer "available" for future matching
	quietly replace pair=id[_n1] if exposed & _weight==1 & deathdate[_n1]>expdate
	quietly replace available=0 if exposed & _weight==1 & deathdate[_n1]>expdate
	
	* now loop through the matched controls and update their pair variable to
	* become their ID (there is probably a more efficient way to do this)
	qui levelsof _n1 if exposed & _weight==1 & deathdate[_n1]>expdate, local(matches)
	qui foreach id of local matches {
		quietly replace pair=id if _id==`id'
	}
	
	* anyone who was a matched control in this run should be made unavailable for further
	* matching
	* (to prevent endlessly matching exposed patients with dead ones)
	quietly replace available=0 if !exposed & _weight==1
	
	* see if we need to do any more match runs
	quietly count if exposed & available
	display "Patients still needing a match: " r(N)
	if r(N)==0 local finished=1
}

* pairs now have a "pair" variable; these are the observations we want to use
gen byte touse=!missing(pair)

* confirm that we now have 1000 pairs
tab exposed if touse

* confirm that the ages are well matched
tabstat age if touse, by(exposed) s(n mean sd q)
bysort pair (exposed): gen agediff=age[1] - age[2] if touse
sum agediff, d

* confirm that the death dates of the controls are after the exposure dates
bysort pair (exposed): assert deathdate[1]>expdate[2] if touse

* now we can set up a survival analysis
* start date is the date of exposure and end date is death or 31dec2009
bysort pair (exposed): gen start=expdate[2] if touse
gen end=deathdate
replace end=td(31dec2009) if missing(end)
gen byte died=!missing(deathdate)
stset end, fail(died) origin(time start) scale(365.25) if(touse)
sts graph, by(exposed)
stcox exposed




On 21/08/2013, at 6:49 AM, "Smit, Menno" <[email protected]> wrote:

> Dear all,
> 
> I am analysing data from a large cohort study in which some individuals become exposed during the 5 year observation period. For each exposed individual, how can I identify the nearest age-matched, unexposed individual that is alive on the date that the exposed become exposed?
> 
> Many thanks,
> Menno
> 
> MD in Tropical Medicine “Mother & Child Health”
> Research Assistant in Malaria Epidemiology
> KEMRI/CDC, P.O.Box 1578, Kisumu 40100, Kenya.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index