Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: listing, then restricting a complete sample space


From   "Svend Juul" <[email protected]>
To   <[email protected]>
Subject   Re: st: listing, then restricting a complete sample space
Date   Wed, 21 Mar 2007 15:35:00 +0100

Martin wrote:
 
I have a number of regions A, B, C...

I am picking one of two areas in each region for a randomised study
 
So my first allocation will be A1, B1, C1....
Second A2, B1, C1....
Third A1, B2, C1
...etc
...
...

I think I will have 2^n possible allocations where n is the number of
regions.
 
The problem is some of these allocations are pathological. For example I
might pick the poorest area in each region, or the richest. For that
reason I want to do a 'restricted randomisation' that is delete these
pathological outcomes from the sample space before randomising. So I 
restrict my sample space by firstly listing out all the 2^n allocations
(in my case n=20) and then scoring them by using the data I have on each area.
 
So e.g.  I might get my 20 areas as follows
A2, B1, C2, D1, E1...
 
I then use the data I have on deprivation in the 2nd area in region A,
the first area in region B and so on, and decide that I want to exclude
this particular allocation from being chosen.
 
I would like to produce a dataset with all the possible allocations in
it. Either of the form
A1  B1  C1 D1 E1
A2  B1  C1 D1 E1
 
For 1048576 lines or
 
Variable names  A1    A2    B1    B2    C1    C2    D1    D2...
                 1     0     1     0     1     0     1     0
                 0     1     1     0     1     0     1     0
 
for 1048576 lines
 
------------------------------------------------------------
 
I understand your problem like this:
 
You want to perform a randomised study in 20 regions. From each region
you pick two areas, and you want to randomise to determine which area
gets which "treatment". Your are worried, however, that by randomising
so few units, the risk of major imbalance is high.
 
I agree on the problem, but I don't think your strategy is practical.
Try to run this do-file:
 
------------------------------------
// The 20 regions:
clear
set obs 20
set seed 12345
gen region = _n
gen index1 = uniform()
gen index2 = uniform()

// rank regions according to difference area1-area2:
gen diff = index1 - index2
sort diff
gen intervention = mod(_n,4)
recode intervention (0 4=1)(2 3=2)
list
------------------------------------
 
We have 20 regions; numbered 1-20. The two areas in a region are
characterized by some deprivation index (index1, index2). Now calculate
the difference between indexes and sort according to this difference.

Next, device an alternating allocation to area1 and area2. I chose:
1,2,2,1,1,2,2, etc.; it is unbiased (as opposed to 1,2,1,2,1).
This will prevent a major imbalance which can occur when randomizing few
units. Just a suggestion.
 
Svend

 
________________________________________________________ 
 
Svend Juul
Institut for Folkesundhed, Afdeling for Epidemiologi
(Institute of Public Health, Department of Epidemiology)
Vennelyst Boulevard 6 
DK-8000 Aarhus C,  Denmark 
Phone, work:  +45 8942 6090 
Phone, home:  +45 8693 7796 
Fax:          +45 8613 1580 
E-mail:       [email protected] 
_________________________________________________________ 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index