Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: Re: st: Permutations and logistic regression (Stata 8)

From   Joseph Coveney <>
To   Statalist <>
Subject   Re: Re: st: Permutations and logistic regression (Stata 8)
Date   Thu, 30 Jun 2005 11:24:47 +0900

Phil Schumm wrote:

At 7:55 AM -0700 6/29/05, n p wrote:
>I have a small dataset with 21 subjects. Seven out of the 21 
>subjects experienced a specific event during a predetermined amount 
>of time whereas the remaining did not.  I would like to investigate 
>the effect of various continuous variables which have been measured 
>at the beginning of the experiment on the probability of the event 
>while adjusting for gender and weight. I would normally go with a 
>logistic regression e.g.
>xi:logit event continuous_var weight i.gender
>but I am thinking that the sample size is too small.  Is it correct 
>to use permutations to deal with the small sample size and if yes is 
>the following syntax correct?
>permute event "xi:logit event continuous_var weight i.gender" _b ,reps(5000)


A nice idea, but I don't believe the command above will give you what 
you want.  The reason is that although you are using -permute-, you 
are still using -logit- (i.e., unconditional maximum likelihood) to 
estimate the parameters for each permutation.  Thus, you won't have 
estimates for those permutations where there is complete separation 
(i.e., where your covariates perfectly predict the response), and the 
resulting incomplete permutation distribution will be incorrect. 


It might not guarantee that the likelihood is evaluable in all instances, but 
would it help to permute the values of the continuous variable in order to 
avoid separation?  Granted, there are assumptions like exchangeability to 
consider, especially with weight and sex, but it might be worth looking into.

I'm not sure what Nikos's _b is.  Is it the test statistic (Wald Z or the like) 
for the continuous variable?  The permutation test evaluates the unpermuted-
data test statistic, then permutes a variable number of times to see how many 
of those times that the permuted-data test statistic is as or more extreme.  
So, Nikos might have to write a short rclass program that returns the test 
statistic in a scalar, say,

return scalar Z = _b[continuous_var] / _se[continuous_var]

and then run -permute continuous_var "mylogit" r(Z), reps(5000)-  You can also 
return the likelihood ratio chi-square test statistic if you run -logit- twice 
(full model, reduced model without continuous_var) in the rclass program.

Also, if you do have situations where complete separation occurs occasionally, 
couldn't you use simulation (generate data that mimic your observed dataset but 
under the null hypothesis for the variable of interest) in order to find out 
how bad it is, that is, what the magnitude of the effect of occasional complete 
separation or other failures to converge (I believe that you'll see them as red 
x's in -permute- and you'll know just how "occasional" the situation is) is on 
the Type I error rate?  Simulation can perhaps give guidance as to whether the 
frequency of complete separation in a particular circumstance should be of 
concern for the objective at-hand.

After Nikos upgrades to Release 9, check out the -vce(jackknife)- option.  In 
(very) limited evaluation, I've found that it brings the Type I error rate of 
even -xtgee- (-family(gaussian) link(identity)-) nicely into range with 
datasets as small as 25.  (Actually, it became a trace conservative at a tad 
under the nominal.)  Of course, you can still encounter complete separation 
with leave-one-out.

Joseph Coveney

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index