[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Michael McCulloch <[email protected]> |

To |
[email protected] |

Subject |
Re: st: selecting random matched controls: survival |

Date |
Fri, 28 Dec 2007 12:46:33 -0800 |

Thank you very very much, Austin. I'm going to study these notes and after testing in simulation, apply to my dataset.

Best wishes for a wonderful holiday,

Michael

Michael--

As I understand, the problem is to randomly select without replacement

two "control" obs with tx==0 to match each "treatment" obs where tx==1

subject to the constraint that _t for each "control" obs is >=

leadtime for the "treatment" obs. In this case, the conceptually

simplest approach is to loop over each treatment case in decreasing

order of leadtime (since the highest values for leadtime also have the

smallest sets of possible matches, if I understand you correctly).

Let me modify ID 11 to have _t==.31 so the cases you gave can be easily matched:

set seed 12345

clear

input ID tx _t leadtime

1 1 1.0 0.4

2 1 1.2 0.3

11 0 .31 .

12 0 0.9 .

13 0 1.1 .

14 0 0.5 .

end

gsort -tx -leadtime

g obs=_n

qui levelsof obs if tx==1, loc(is)

g match=.

qui foreach i of local is {

sort obs

local m=ID[`i']

local lt=leadtime[`i']

g u=uniform() if mi(match) & tx==0 & _t>=`lt'

sort u

replace match=`m' in 1/2

drop u

}

li

There is almost certainly a more efficient way to do this, perhaps

using Mata, and possibly some of Ben Jann's contributions, but the

above is simple enough to be understood easily.

The idea is to pick each "treatment" obs in turn, sort the possible

matches randomly, and allocate the first two (or put in another number

instead of 2 to pick more matches) possible matches to the "treatment"

obs whose turn it is (by assigning its ID to both those control obs in

the "match" variable).

What you need to do downstream from here may necessitate some other

processing, but the general approach can be modified to suit many

purposes.

If you run into trouble with this approach, a likely culprit is that a

condition like _t>=`lead' can fail to be satisfied even when it looks

like it should be (when both are .3 to all appearances, for example;

see various FAQs e.g.

http://www.stata.com/support/faqs/data/float.html) and a bunch of

missings are generated for your u variable. This kind of thing can be

hard to track down, but you can

set trace on

and remove the -qui- qualifiers to see what is going on inside the

loop. You can also put a few commands of the form -list in 1/5- or

somesuch inside the loop to see what changes are being made to the

data at each step.

Here's an example where it goes wrong:

set seed 12345

clear

input ID tx _t leadtime

1 1 1.0 0.4

2 1 1.2 0.3

11 0 .3 .

12 0 0.9 .

13 0 1.1 .

14 0 0.5 .

end

gsort -tx -leadtime

g obs=_n

qui levelsof obs if tx==1, loc(is)

g match=.

qui foreach i of local is {

sort obs

local m=ID[`i']

local lt=leadtime[`i']

g u=uniform() if mi(match) & tx==0 & _t>=`lt'

sort u

replace match=`m' in 1/2

drop u

}

li

which can be sorted out with a simple trick:

set seed 12345

clear

input ID tx _t leadtime

1 1 1.0 0.4

2 1 1.2 0.3

11 0 .3 .

12 0 0.9 .

13 0 1.1 .

14 0 0.5 .

end

gsort -tx -leadtime

g obs=_n

qui levelsof obs if tx==1, loc(is)

g match=.

qui foreach i of local is {

sort obs

local m=ID[`i']

local lt=leadtime[`i']

g u=uniform() if mi(match) & tx==0 & _t>=float(`lt')

sort u

replace match=`m' in 1/2

drop u

}

li

but even better would be

set seed 12345

clear

input ID tx _t leadtime

1 1 1.0 0.4

2 1 1.2 0.3

11 0 .3 .

12 0 0.9 .

13 0 1.1 .

14 0 0.5 .

end

gsort -tx -leadtime

g obs=_n

qui levelsof obs if tx==1, loc(is)

g match=.

qui foreach i of local is {

sort obs

local m=ID[`i']

local lt=leadtime[`i']

g u=uniform() if mi(match) & tx==0 & _t>=float(`lt')

sort u, stable

assert u<. in 1/2

replace match=`m' in 1/2

drop u

}

li

(see -help sort- and -help assert- for details).

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: selecting random matched controls: survival***From:*"Austin Nichols" <[email protected]>

**Re: st: selecting random matched controls: survival***From:*"Austin Nichols" <[email protected]>

- Prev by Date:
**Re: st: selecting random matched controls: survival** - Next by Date:
**Re: st: "analytics"** - Previous by thread:
**Re: st: selecting random matched controls: survival** - Next by thread:
**st: new website for Stata on Mac OS X** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |