Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Maarten buis <maartenbuis@yahoo.co.uk> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: "Separation" issue in clustered/Longitudinal binary data. |
Date | Wed, 22 Dec 2010 10:23:21 +0000 (GMT) |
--- On Wed, 22/12/10, sigontw@uchicago.edu asked: > > The outcome variable is a binary variable (a patient > > reported drug's side effect) with repeated measures for > > three waves. Now I have an intervention (whether the > > participant received the drug). <snip> --- On Wed, 22/12/10, Maarten buis answered: > I may be missing something obvious, but don't you need to > use the drug in order to experience its side-effects. <snip> > If something like that is happening in your data, then it is > hard to see how an "effect" of your treatment could have a > meaningful substantive interpretation. To expand a bit on this answer: The problems with seperation are a logical consequence of how we define effects in "logit- like-models". The effect is a ratio of odds. Consider the example below: *--------------- begin example ------------------ // get some data and prepare it sysuse auto, clear gen byte good = rep78 > 3 if rep78 < . gen byte baseline = 1 // estimate a logistic regression logit good i.foreign baseline, or nocons *---------------- end example --------------------- (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq ) The number reported for baseline is the baseline odds, the number of successes per failure for someone (in this case somecar) who has the value 0 on all covariates. So for a domestic (=US) car we expect to to find .297 cars with a good repair record for every car with a bad repair record. The effect of foreign tells us that the odds of having a good repair record is 20.18 times larger for foreign cars than domestic cars. It is also instructive to look at the individual odds. In the example below we did not leave the variable for the reference category out of the model, but instead excluded the constant. *---------------- begin example ------------------- // get the odds for foreign and domestic cars logit good ibn.foreign, nocons or // odds ratio is a well chosen name for this statistic, // as it is literaly a ratio of odds di exp(_b[1.foreign])/exp(_b[0.foreign]) *----------------- end example -------------------- Here we see that as before the odds of having a good repair record is .297 good cars for every bad car. We can now also see that the odds of having a good repair record is 6 good cars for every bad car. The odds ratio we found in the first example is thus literally the ratio of these odds. In your case your baseline odds is 0: for patient who have not been given the drug there are 0 patients who experience the side-effects for every patient who did not experience the side-effects. How many times larger is the odds of experiencing the side-effects if the baseline is 0? There is just no answer to that question. You can also see that by noticing that the odds ratio is in that case some number divided by 0, which is undefined. As I understand it, what commands like -firthlogit- do is assume that the baseline odds isn't really 0 in the population, but that the odds is so small that just because of randomness your sample by accident did not find any successes in your baseline group. However, if the baseline odds is truely 0, as is in your case probably by definition the case, than these methods can not help. You can run these programs, but the results just don't mean anything. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/