# Re: st: selecting random matched controls: survival

 From "Austin Nichols" <[email protected]> To [email protected] Subject Re: st: selecting random matched controls: survival Date Fri, 28 Dec 2007 15:29:36 -0500

```Michael--
As I understand, the problem is to randomly select without replacement
two "control" obs with tx==0 to match each "treatment" obs where tx==1
subject to the constraint that _t for each "control" obs is >=
leadtime for the "treatment" obs.   In this case, the conceptually
simplest approach is to loop over each treatment case in decreasing
order of leadtime (since the highest values for leadtime also have the
smallest sets of possible matches, if I understand you correctly).

Let me modify ID 11 to have _t==.31 so the cases you gave can be easily matched:

set seed 12345
clear
1   1  1.0 0.4
2   1  1.2 0.3
11  0  .31  .
12  0  0.9  .
13  0  1.1  .
14  0  0.5  .
end
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
sort obs
local m=ID[`i']
g u=uniform() if mi(match) & tx==0 & _t>=`lt'
sort u
replace match=`m' in 1/2
drop u
}
li

There is almost certainly a more efficient way to do this, perhaps
using Mata, and possibly some of Ben Jann's contributions, but the
above is simple enough to be understood easily.

The idea is to pick each "treatment" obs in turn, sort the possible
matches randomly, and allocate the first two (or put in another number
instead of 2 to pick more matches) possible matches to the "treatment"
obs whose turn it is (by assigning its ID to both those control obs in
the "match" variable).

What you need to do downstream from here may necessitate some other
processing, but the general approach can be modified to suit many
purposes.

If you run into trouble with this approach, a likely culprit is that a
condition like _t>=`lead' can fail to be satisfied even when it looks
like it should be (when both are .3 to all appearances, for example;
see various FAQs e.g.
http://www.stata.com/support/faqs/data/float.html) and a bunch of
missings are generated for your u variable.  This kind of thing can be
hard to track down, but you can
set trace on
and remove the -qui- qualifiers to see what is going on inside the
loop.  You can also put a few commands of the form -list in 1/5- or
somesuch inside the loop to see what changes are being made to the
data at each step.

Here's an example where it goes wrong:

set seed 12345
clear
1   1  1.0 0.4
2   1  1.2 0.3
11  0  .3  .
12  0  0.9  .
13  0  1.1  .
14  0  0.5  .
end
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
sort obs
local m=ID[`i']
g u=uniform() if mi(match) & tx==0 & _t>=`lt'
sort u
replace match=`m' in 1/2
drop u
}
li

which can be sorted out with a simple trick:

set seed 12345
clear
1   1  1.0 0.4
2   1  1.2 0.3
11  0  .3  .
12  0  0.9  .
13  0  1.1  .
14  0  0.5  .
end
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
sort obs
local m=ID[`i']
g u=uniform() if mi(match) & tx==0 & _t>=float(`lt')
sort u
replace match=`m' in 1/2
drop u
}
li

but even better would be

set seed 12345
clear
1   1  1.0 0.4
2   1  1.2 0.3
11  0  .3  .
12  0  0.9  .
13  0  1.1  .
14  0  0.5  .
end
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
sort obs
local m=ID[`i']
g u=uniform() if mi(match) & tx==0 & _t>=float(`lt')
sort u, stable
assert u<. in 1/2
replace match=`m' in 1/2
drop u
}
li

(see -help sort- and -help assert- for details).
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```