Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "Separation" issue in clustered/Longitudinal binary data.

From   <>
Subject   Re: st: "Separation" issue in clustered/Longitudinal binary data.
Date   Wed, 22 Dec 2010 15:01:51 -0600 (CST)

You are absolutely right! In my case, the research design 
didn't collect the "side effects" data if the respondents were 
not on meds (there was a skip pattern). Usually in clinical 
research, there will be a placebo group serving as the 
reference (so that the denominator in odds ratio won't be a 
true zero.) I just hoped that the baseline data, where most of 
the respondents who were not on med and didn't report any SE, 
could serve as their own control group. But I forgot the most 
basic assumption of odds ratio. Thank you so much for your 

However, just out of my curiosity - although my question has 
been well solved - is there any way to model total/quasi-
separated binary outcomes in longitudinal data? I find now 
the firth correction is available for logistic and Cox 
proportion regression, but can't find any equivalence for 
longitudinal data. I can easily foresee this could be an 
issue in longitudinal data analysis if the outcome variable a 
is binary variable. 

Best, and happy Christmas.

Cheenghee M Koh   

---- Original message ----
>Date: Wed, 22 Dec 2010 10:23:21 +0000 (GMT)
>From: (on behalf of 
Maarten buis <>)
>Subject: Re: st: "Separation" issue in clustered/Longitudinal 
binary data.  
>--- On Wed, 22/12/10, asked:
>> > The outcome variable is a binary variable (a patient
>> > reported drug's side effect) with repeated measures for
>> > three waves. Now I have an intervention (whether the
>> > participant received the drug). <snip>
>--- On Wed, 22/12/10, Maarten buis answered:
>> I may be missing something obvious, but don't you need to
>> use the drug in order to experience its side-effects. 
>> If something like that is happening in your data, then it 
>> hard to see how an "effect" of your treatment could have a
>> meaningful substantive interpretation. 
>To expand a bit on this answer: The problems with seperation 
>are a logical consequence of how we define effects in "logit-
>like-models". The effect is a ratio of odds. Consider the 
>example below:
>*--------------- begin example ------------------
>// get some data and prepare it
>sysuse auto, clear
>gen byte good = rep78 > 3 if rep78 < .
>gen byte baseline = 1
>// estimate a logistic regression
>logit good i.foreign baseline, or nocons
>*---------------- end example ---------------------
>(For more on examples I sent to the Statalist see: 
> )
>The number reported for baseline is the baseline odds,
>the number of successes per failure for someone (in this
>case somecar) who has the value 0 on all covariates. So
>for a domestic (=US) car we expect to to find .297 cars
>with a good repair record for every car with a bad repair
>record. The effect of foreign tells us that the odds of
>having a good repair record is 20.18 times larger for 
>foreign cars than domestic cars.
>It is also instructive to look at the individual odds.
>In the example below we did not leave the variable for 
>the reference category out of the model, but instead
>excluded the constant.
>*---------------- begin example -------------------
>// get the odds for foreign and domestic cars
>logit good ibn.foreign, nocons or
>// odds ratio is a well chosen name for this statistic,
>// as it is literaly a ratio of odds
>di exp(_b[1.foreign])/exp(_b[0.foreign])
>*----------------- end example --------------------
>Here we see that as before the odds of having a good
>repair record is .297 good cars for every bad car. We
>can now also see that the odds of having a good repair
>record is 6 good cars for every bad car. The odds ratio
>we found in the first example is thus literally the
>ratio of these odds.
>In your case your baseline odds is 0: for patient who
>have not been given the drug there are 0 patients who
>experience the side-effects for every patient who did 
>not experience the side-effects. How many times larger 
>is the odds of experiencing the side-effects if the 
>baseline is 0? There is just no answer to that question.
>You can also see that by noticing that the odds ratio is
>in that case some number divided by 0, which is undefined.
>As I understand it, what commands like -firthlogit- do is 
>assume that the baseline odds isn't really 0 in the 
>population, but that the odds is so small that just 
>because of randomness your sample by accident did not find 
>any successes in your baseline group. However, if the 
>baseline odds is truely 0, as is in your case probably by 
>definition the case, than these methods can not help. You 
>can run these programs, but the results just don't mean 
>Hope this helps,
>Maarten L. Buis
>Institut fuer Soziologie
>Universitaet Tuebingen
>Wilhelmstrasse 36
>72074 Tuebingen
>*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index