[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: Re: st: Permutations and logistic regression (Stata 8)
Phil Schumm wrote:
At 7:55 AM -0700 6/29/05, n p wrote:
>I have a small dataset with 21 subjects. Seven out of the 21
>subjects experienced a specific event during a predetermined amount
>of time whereas the remaining did not. I would like to investigate
>the effect of various continuous variables which have been measured
>at the beginning of the experiment on the probability of the event
>while adjusting for gender and weight. I would normally go with a
>logistic regression e.g.
>xi:logit event continuous_var weight i.gender
>but I am thinking that the sample size is too small. Is it correct
>to use permutations to deal with the small sample size and if yes is
>the following syntax correct?
>permute event "xi:logit event continuous_var weight i.gender" _b ,reps(5000)
A nice idea, but I don't believe the command above will give you what
you want. The reason is that although you are using -permute-, you
are still using -logit- (i.e., unconditional maximum likelihood) to
estimate the parameters for each permutation. Thus, you won't have
estimates for those permutations where there is complete separation
(i.e., where your covariates perfectly predict the response), and the
resulting incomplete permutation distribution will be incorrect.
It might not guarantee that the likelihood is evaluable in all instances, but
would it help to permute the values of the continuous variable in order to
avoid separation? Granted, there are assumptions like exchangeability to
consider, especially with weight and sex, but it might be worth looking into.
I'm not sure what Nikos's _b is. Is it the test statistic (Wald Z or the like)
for the continuous variable? The permutation test evaluates the unpermuted-
data test statistic, then permutes a variable number of times to see how many
of those times that the permuted-data test statistic is as or more extreme.
So, Nikos might have to write a short rclass program that returns the test
statistic in a scalar, say,
return scalar Z = _b[continuous_var] / _se[continuous_var]
and then run -permute continuous_var "mylogit" r(Z), reps(5000)- You can also
return the likelihood ratio chi-square test statistic if you run -logit- twice
(full model, reduced model without continuous_var) in the rclass program.
Also, if you do have situations where complete separation occurs occasionally,
couldn't you use simulation (generate data that mimic your observed dataset but
under the null hypothesis for the variable of interest) in order to find out
how bad it is, that is, what the magnitude of the effect of occasional complete
separation or other failures to converge (I believe that you'll see them as red
x's in -permute- and you'll know just how "occasional" the situation is) is on
the Type I error rate? Simulation can perhaps give guidance as to whether the
frequency of complete separation in a particular circumstance should be of
concern for the objective at-hand.
After Nikos upgrades to Release 9, check out the -vce(jackknife)- option. In
(very) limited evaluation, I've found that it brings the Type I error rate of
even -xtgee- (-family(gaussian) link(identity)-) nicely into range with
datasets as small as 25. (Actually, it became a trace conservative at a tad
under the nominal.) Of course, you can still encounter complete separation
* For searches and help try: