# Re: st: Permutations and logistic regression (Stata 8)

 From Phil Schumm To statalist@hsphsun2.harvard.edu Subject Re: st: Permutations and logistic regression (Stata 8) Date Wed, 29 Jun 2005 12:05:19 -0500

```At 7:55 AM -0700 6/29/05, n p wrote:
```
I have a small dataset with 21 subjects. Seven out of the 21 subjects experienced a specific event during a predetermined amount of time whereas the remaining did not. I would like to investigate the effect of various continuous variables which have been measured at the beginning of the experiment on the probability of the event while adjusting for gender and weight. I would normally go with a logistic regression e.g.

xi:logit event continuous_var weight i.gender

but I am thinking that the sample size is too small. Is it correct to use permutations to deal with the small sample size and if yes is the following syntax correct?

permute event "xi:logit event continuous_var weight i.gender" _b ,reps(5000)

Nikos,

A nice idea, but I don't believe the command above will give you what you want. The reason is that although you are using -permute-, you are still using -logit- (i.e., unconditional maximum likelihood) to estimate the parameters for each permutation. Thus, you won't have estimates for those permutations where there is complete separation (i.e., where your covariates perfectly predict the response), and the resulting incomplete permutation distribution will be incorrect. This problem will be particularly pronounced with a small dataset which, of course, is exactly when the issue of exact analyses arises. In fact, the typical extreme example is the case where the unconditional MLE doesn't exist even for the original (un-permuted) data.

To use -permute- to do hypotheses testing within the context of a logistic regression model, you'd need to base the test(s) on the sufficient statistic(s) for your model rather than on the unconditional maximum likelihood estimate. This would be pretty straightforward, and I believe you could use -permute- to do it. However to get actual parameter estimates and standard errors, you'd need to maximize the appropriate conditional likelihood, and even then I believe that there are situations where a maximum does not exist (in such cases, a different estimator is needed).

One more comment. With the -permute- command as you've specified it above, you are conditioning merely on the total number of events. In many problems, however, there are other covariates which you may also wish to regard as nuisance parameters, and your inference should condition on these as well. If these are discrete variables, you may be able to use the strata() option of -permute- to obtain the appropriate conditional permutation distribution.

WARNING: Please note that this is a topic (i.e., exact logistic regression) which I know almost nothing about, and so you should consume my response very critically. You can read the theory in Cox and Snell (1989), and I know that there are several accessible papers in the statistical literature (sorry, I don't have any references easily available at the moment). Hopefully those on the list more knowledgeable than I will correct any misstatements I may have made.

-- Phil

Cox, D. R., and E. J. Snell. 1989. Analysis of Binary Data, 2nd edn. New York: Chapman & Hall.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/