[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Verify randomization in a large sample

From   "Kieran McCaul" <[email protected]>
To   <[email protected]>
Subject   RE: st: Verify randomization in a large sample
Date   Wed, 1 Oct 2008 10:36:01 +0800

If the purpose is to check "balance" after randomization, I can't see how any statistical testing will help.

Statistical tests test a null hypothesis against an alternative.

The null is essentially "any differences are no greater than would be expected by chance alone'.  The alternative is "differences are so large that they are unlikely to be due to chance".

If the study has demonstrably been randomized, then all differences, no matter how extreme, are due to chance.  

Lack of balance, which some people seem to obsess about, is not an indication of failure of the randomization process.  Lack of balance will occur.  It will occur. Always.

The purpose of randomisation is to remove bias, not achieve balance.

Lack of balance will be a problem if it biases comparison between arms of the study.  So adjust for the lack of balance in the analysis. 

Kieran McCaul MPH PhD
WA Centre for Health & Ageing (M573)
University of Western Australia
Level 6, Ainslie House
48 Murray St
Perth 6000
Phone: (08) 9224-2140
Fax: (08) 9224 8009
email: [email protected] 
The fact that no one understands you doesn't make you an artist.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Austin Nichols
Sent: Wednesday, 1 October 2008 10:05 AM
To: [email protected]
Subject: Re: st: Verify randomization in a large sample

Jos� Luis Ch�vez Calva <[email protected]>:
The only way to verify randomization is to observe the randomization
mechanism.  But you can check the balance by comparing means of
several variables in the dataset like age, gender, language, etc.
across categories.  For example, if you have treatment and control
groups defined by a variable t (=0 for control and =1 for treatment),
you can do
 hotelling age gender language etc, by(t)
 reg t age gender language etc
to get an F test of the null that all means are the same.  Assuming
variances may differ, you can
 reg t age gender language etc, r
and for alternative models you can run logit or probit instead (to get
a chi2 test).  Presumably, for a categorical t you could run
 mlogit t age gender language etc
or -mprobit- assuming a specific error distribution under the null of
randomization (in which case the X vars should not help you predict
t).  All of that is just for comparisons of means; for higher moments
you will need tests of equality of distributions (e.g. -ksmirnov-) or
graphical methods (e.g. -qqplot-).

On Tue, Sep 30, 2008 at 8:18 PM, Jos� Luis Ch�vez Calva
<[email protected]> wrote:
> Dear Stata users:
> I have a dataset on household income with a large number of
> individuals. The set contains one variable indicating the locality
> where each individual lives and another one indicating the household
> to which this individual belongs to. I would like to know how to
> verify randomization both at locality and household level using
> several variables in the dataset like age, gender, language, etc.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index