On Wed, 21 Nov 2001, William Gould wrote: > Lee Sieswerda <Lee.Sieswerda@tbdhu.com> wrote, > > > Cytel makes LogXact for doing "exact" logistic regression (www.cytel.com). > > In their ads they have something called the "Cytel Challenge" where they ask > > people to try to fit a logistic regression model to the following data: > > > > Diar AB Age Hosp > > 0 0 0 0 > > 6 0 0 1 > > 1.9 0 1 0 > > 2.9 0 1 1 > > 100 1 1 1 > > > > The percentage of patients with diarrhea (Diar) is the outcome and the other > > three variables are predictors: [...] Taking the challenge using the > > -logistic- in Stata fails to produce a converged model. I really don't know > > the details of how LogXact manages to fit this model, but my question is: > > would it not be possible to program Stata to do "exact" logistic regression > > and be able to fit this model? Or is there something inherently different > > about Cytel's software that it can accomplish this and Stata cannot? > > I take issue with Lee's comment that "Stata fails to produce a converged > model" -- Lee did something wrong -- but I do not take issue with Cytel's > ad (although I have not seen it). > > Something evidently got left out of the ad or the posting because, to do the > above example, we need to know the population sizes. Nevertheless, I went to > Cytel's web site and found a longer problem on which the ad was obviously > based. The URL is http://www.cytel.com/new.pages/LX.ex.04.html. On the > web, the problem is longer. > > In the longer problem, there are more observations, the population is > included, and there are five independent variables: Cephelaxin, Clindomycin, > Sex, Age, and LOS. In any case, the web site says, > > Challenge: Try fitting a logistic regression model to the data with > all five covariates included. > > so let's do that and see exactly the point Cytel wishes to make. > > After loading the data, I had 18 observations and the first five looked like > this: > > . list in 1/5 > > diarrhea totno cephalex clindomy sex age los > 1. 0 174 0 0 0 0 0 > 2. 1 113 0 0 0 0 1 > 3. 0 349 0 0 0 1 0 > 4. 16 451 0 0 0 1 1 > 5. 0 213 0 0 1 0 0 > > To estimate this model, I must use the -blogit- command since that is > the Stata's logit command for estimating when the dependent data contain > counts of the positive outcomes and the total population. I also specify > the -or- option to obtain odds ratios. Here is the result of running the > model: > > ============================================================================== > . blogit diarrhea totno cep cli sex age los, or > note: cephalex~=0 predicts success perfectly > cephalex dropped and 2 obs not used > > > Logit estimates Number of obs = 2488 > LR chi2(4) = 91.48 > Prob > chi2 = 0.0000 > Log likelihood = -218.30047 Pseudo R2 = 0.1732 > > ------------------------------------------------------------------------------ > _outcome | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > clindomy | 9.198602 2.89523 7.05 0.000 4.963739 17.04648 > sex | .8263463 .2336678 -0.67 0.500 .474751 1.438329 > age | 2.440564 1.176263 1.85 0.064 .948947 6.276803 > los | 11.84492 7.113316 4.12 0.000 3.65051 38.43354 > ------------------------------------------------------------------------------ > ============================================================================== > > What Cytel wants you to notice is Stata's message > > note: cephalex~=0 predicts success perfectly > cephalex dropped and 2 obs not used > > That is the point of their challange: When cephalex is nonzero, there is > always a positive outcome: > > . list if ceph==1 > > diarrhea totno cephalex clindomy sex age los > 17. 1 1 1 0 0 1 1 > 18. 4 4 1 0 1 1 1 > > There are a total of 5 patients who were observed with cephalex==1 and all > five patients suffered from diarrhea. How do you interpret that? Does that > mean cephalex==1 always results in diarrhea? Well, of course it does not. > With only five such patients, Cytel's computationally intensive methods were > able to put a confidence interval around the result: [27.52, infinity]. Very > nice. (I would like somebody to explain to me why the point estimate is a > finite 207.40 rather than infinite, but I'm sure Cytel has carefully > considered the answers they produce). > > In any case, Stata smartly recognized its limitations and estimated the model > conditional on cephalex==0. Some other packages might not have recognized the > problem and gotten messed up in the numerics. > > I leave it for you to decide how important it is to put a confidence interval > around cephalexin in this particular case, but without question there are > problems for which doing this kind of thing is important. > > -- Bill > wgould@stata.com > * > * Help is available at > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

