[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Imbalance in control versus treated group, and weights

From   Paul Seed <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Imbalance in control versus treated group, and weights
Date   Tue, 14 Oct 2008 17:40:39 +0100

Dear Alexander,
I don't think you need to do anything as complicated as Stas suggests.

If I understand you, the imbalance between your two "randomised" groups is caused by including certain people in the control group (those with a specific criterion say c= 1, rather than c =0) who are ineligible for the treatment group. Apart from this, the groups are balanced. If it is really as simple as described, the solution is simple: drop the subjects from the control group with c = 1, as they are ineligible for the treatment group. You should then have two balanced groups of 3,000 subjects each. If there are (by accident) a few subjects with c=1 in the treatment group, they should also be dropped.
The sample is rather smaller, but the randomised comparison is valid

Best wishes,


Paul T Seed MSc CStat, Lecturer in Medical Statistics,
tel  (+44) (0) 20 7188 3642, fax (+44) (0) 20 7620 1227
Wednesdays: (+4) (0) 20 7848 4148

[email protected], [email protected]

King's College London, Division of Reproduction and Endocrinology
St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH

 SV: st: Imbalance in control versus treated group, and weights

Date: Mon, 13 Oct 2008 08:44:02 +0200
From: <[email protected]>
Subject: SV: st: Imbalance in control versus treated group, and weights

Stas, thanks for this. I'll have a go at your idea.

Best wishes,

- -----Opprinnelig melding-----
Fra: [email protected] [mailto:[email protected]] På vegne av Stas Kolenikov
Sendt: 10. oktober 2008 00:15
Til: [email protected]
Emne: Re: st: Imbalance in control versus treated group, and weights

I am a design-based inference guy, I know too much of survey statistics and too little of anything else :)). So here are my two design-based cents.

If you had say 5000 people with z=1 all sampled, and out of 5000 remaining z=0 people, 3000 were sampled, I would just treat those as strata with differential probabilities of selection:
Pr[selection|z=1]=1, Pr[selection|z=0]=3/5, so the pweight to go along with the first group is 1, while the weight to go along with the second group is 5/3=1.667. That should actually be about the same reweighting idea that Austin suggested originally.

There is literature on an area that would seem to be related to your problem, the population-based case-control studies, that takes the problem to the extreme: it is the dependent variable itself that is used as a criteria for sampling. Usually this applies to rare diseases, when all the cases are taken into the data set (Prob[selection]=1, weight=1, and controls are sampled from population (Prob[selection] is a tiny number, weight = 1e5 or something like that). The interest is often in probability of having the disease conditional on some covariates, and miraculously enough you can estimate this model using maximum likeihood without weights -- the only parameter that will be biased is the intercept. Alastair Scott from New Zealand is the guy who knows all about it; see

On 10/8/08, [email protected] <[email protected]> wrote:
Thank you for the advice. Very helpful!

 In this spesific case z is a dummy, and if z=1 then this will increase the likelihood of observing x=1. And yes, I do observe outcomes for the group that was supposed to be treated, but were not.

 Best wishes,

 -----Opprinnelig melding-----
 Fra: [email protected]
[mailto:[email protected]] På vegne av Austin
 Sendt: 8. oktober 2008 18:39
 Til: [email protected]
 Emne: Re: st: Imbalance in control versus treated group, and weights

 It is possible that some kind of propensity score reweighting or regression discontinuity design would be appropriate here, but without much more information, it is hard to offer any specific advice.  How does z affect x in the group supposed to have x=1?  Do you observe outcomes for the group supposed to have x=1 but having x=0? Etc.

 Running a probit with the assumption E(y)=F(b0+b1*x+b2*z) seems unlikely to recover a good estimate of the effect of x on y unless that assumption is actually true!

 On Wed, Oct 8, 2008 at 12:23 PM,  <[email protected]> wrote:
 > Dear Statalisters,
 > I have the following problem. I have given a sample of 10000 people as targets for receiving an offer, and I have a control group equal to 5000 people. I know that the potentially treated and the controlgroup is representative. However, without my knowledge only 8000 of the 10000 targets were treated, and a specific criteria was used to pick those 8000 from the 10000.
 > This has created an imbalance between my controlgroup and those treated, and this imbalance is identified and only concerns one variable. I want to investigate whether the offer given could reduce the defection rate of customers, but the variable that created this imbalance is known to hugely impact the defection rate. To reduce this problem I would like to use weights in Stata, but I am unsure on how to approach this? Any tips would be greatly appreciated.
 > Also, say that I did not correct for this, and did the following probit model with the following variables, y=defected/not defected, x=treated/control, z=factor that created imbalance:
 >        y=b0+b1*x+b2*z
 > would it be appropriate to say that it was possible to control for the imbalance by including it as a independent variable in this fashion?
 > Best wishes,
 > Alexander Severinsen

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index