Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "Separation" issue in clustered/Longitudinal binary data.

From   Maarten buis <>
Subject   Re: st: "Separation" issue in clustered/Longitudinal binary data.
Date   Wed, 22 Dec 2010 10:23:21 +0000 (GMT)

--- On Wed, 22/12/10, asked:
> > The outcome variable is a binary variable (a patient
> > reported drug's side effect) with repeated measures for
> > three waves. Now I have an intervention (whether the
> > participant received the drug). <snip>

--- On Wed, 22/12/10, Maarten buis answered:
> I may be missing something obvious, but don't you need to
> use the drug in order to experience its side-effects. <snip>
> If something like that is happening in your data, then it is
> hard to see how an "effect" of your treatment could have a
> meaningful substantive interpretation. 

To expand a bit on this answer: The problems with seperation 
are a logical consequence of how we define effects in "logit-
like-models". The effect is a ratio of odds. Consider the 
example below:

*--------------- begin example ------------------
// get some data and prepare it
sysuse auto, clear
gen byte good = rep78 > 3 if rep78 < .
gen byte baseline = 1

// estimate a logistic regression
logit good i.foreign baseline, or nocons
*---------------- end example ---------------------
(For more on examples I sent to the Statalist see: )

The number reported for baseline is the baseline odds,
the number of successes per failure for someone (in this
case somecar) who has the value 0 on all covariates. So
for a domestic (=US) car we expect to to find .297 cars
with a good repair record for every car with a bad repair
record. The effect of foreign tells us that the odds of
having a good repair record is 20.18 times larger for 
foreign cars than domestic cars.

It is also instructive to look at the individual odds.
In the example below we did not leave the variable for 
the reference category out of the model, but instead
excluded the constant.

*---------------- begin example -------------------
// get the odds for foreign and domestic cars
logit good ibn.foreign, nocons or

// odds ratio is a well chosen name for this statistic,
// as it is literaly a ratio of odds
di exp(_b[1.foreign])/exp(_b[0.foreign])
*----------------- end example --------------------

Here we see that as before the odds of having a good
repair record is .297 good cars for every bad car. We
can now also see that the odds of having a good repair
record is 6 good cars for every bad car. The odds ratio
we found in the first example is thus literally the
ratio of these odds.

In your case your baseline odds is 0: for patient who
have not been given the drug there are 0 patients who
experience the side-effects for every patient who did 
not experience the side-effects. How many times larger 
is the odds of experiencing the side-effects if the 
baseline is 0? There is just no answer to that question.
You can also see that by noticing that the odds ratio is
in that case some number divided by 0, which is undefined.

As I understand it, what commands like -firthlogit- do is 
assume that the baseline odds isn't really 0 in the 
population, but that the odds is so small that just 
because of randomness your sample by accident did not find 
any successes in your baseline group. However, if the 
baseline odds is truely 0, as is in your case probably by 
definition the case, than these methods can not help. You 
can run these programs, but the results just don't mean 

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index