[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: selecting random matched controls: survival

From	"Austin Nichols" <[email protected]>
To	[email protected]
Subject	Re: st: selecting random matched controls: survival
Date	Fri, 28 Dec 2007 15:29:36 -0500

Michael--
As I understand, the problem is to randomly select without replacement
two "control" obs with tx==0 to match each "treatment" obs where tx==1
subject to the constraint that _t for each "control" obs is >=
leadtime for the "treatment" obs.   In this case, the conceptually
simplest approach is to loop over each treatment case in decreasing
order of leadtime (since the highest values for leadtime also have the
smallest sets of possible matches, if I understand you correctly).

Let me modify ID 11 to have _t==.31 so the cases you gave can be easily matched:

set seed 12345
clear
input ID  tx _t  leadtime
      1   1  1.0 0.4
      2   1  1.2 0.3
      11  0  .31  .
      12  0  0.9  .
      13  0  1.1  .
      14  0  0.5  .
end
gsort -tx -leadtime
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
 sort obs
 local m=ID[`i']
 local lt=leadtime[`i']
 g u=uniform() if mi(match) & tx==0 & _t>=`lt'
 sort u
 replace match=`m' in 1/2
 drop u
 }
li

There is almost certainly a more efficient way to do this, perhaps
using Mata, and possibly some of Ben Jann's contributions, but the
above is simple enough to be understood easily.

The idea is to pick each "treatment" obs in turn, sort the possible
matches randomly, and allocate the first two (or put in another number
instead of 2 to pick more matches) possible matches to the "treatment"
obs whose turn it is (by assigning its ID to both those control obs in
the "match" variable).

What you need to do downstream from here may necessitate some other
processing, but the general approach can be modified to suit many
purposes.

If you run into trouble with this approach, a likely culprit is that a
condition like _t>=`lead' can fail to be satisfied even when it looks
like it should be (when both are .3 to all appearances, for example;
see various FAQs e.g.
http://www.stata.com/support/faqs/data/float.html) and a bunch of
missings are generated for your u variable.  This kind of thing can be
hard to track down, but you can
 set trace on
and remove the -qui- qualifiers to see what is going on inside the
loop.  You can also put a few commands of the form -list in 1/5- or
somesuch inside the loop to see what changes are being made to the
data at each step.

Here's an example where it goes wrong:

set seed 12345
clear
input ID  tx _t  leadtime
      1   1  1.0 0.4
      2   1  1.2 0.3
      11  0  .3  .
      12  0  0.9  .
      13  0  1.1  .
      14  0  0.5  .
end
gsort -tx -leadtime
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
 sort obs
 local m=ID[`i']
 local lt=leadtime[`i']
 g u=uniform() if mi(match) & tx==0 & _t>=`lt'
 sort u
 replace match=`m' in 1/2
 drop u
 }
li

which can be sorted out with a simple trick:

set seed 12345
clear
input ID  tx _t  leadtime
      1   1  1.0 0.4
      2   1  1.2 0.3
      11  0  .3  .
      12  0  0.9  .
      13  0  1.1  .
      14  0  0.5  .
end
gsort -tx -leadtime
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
 sort obs
 local m=ID[`i']
 local lt=leadtime[`i']
 g u=uniform() if mi(match) & tx==0 & _t>=float(`lt')
 sort u
 replace match=`m' in 1/2
 drop u
 }
li

but even better would be

set seed 12345
clear
input ID  tx _t  leadtime
      1   1  1.0 0.4
      2   1  1.2 0.3
      11  0  .3  .
      12  0  0.9  .
      13  0  1.1  .
      14  0  0.5  .
end
gsort -tx -leadtime
g obs=_n
qui levelsof obs if tx==1, loc(is)
g match=.
qui foreach i of local is {
 sort obs
 local m=ID[`i']
 local lt=leadtime[`i']
 g u=uniform() if mi(match) & tx==0 & _t>=float(`lt')
 sort u, stable
 assert u<. in 1/2
 replace match=`m' in 1/2
 drop u
 }
li

(see -help sort- and -help assert- for details).
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: selecting random matched controls: survival
  - From: Michael McCulloch <[email protected]>

References:
- Re: st: selecting random matched controls: survival
  - From: "Austin Nichols" <[email protected]>

Prev by Date: Re: st: "analytics"
Next by Date: Re: st: selecting random matched controls: survival
Previous by thread: Re: st: selecting random matched controls: survival
Next by thread: Re: st: selecting random matched controls: survival
Index(es):
- Date
- Thread