Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Blown up Std. errors in logistic regression with bootstrap

 From Michael Wahman To "statalist@hsphsun2.harvard.edu" Subject Re: st: Blown up Std. errors in logistic regression with bootstrap Date Mon, 13 Dec 2010 16:00:49 +0100

```Dear Maarten and Stas,

```
Thank you very much for your excellent advice! I really appreciate it. As you suggested, I used the noisily option to diagnose the problem. It turned out to be exactly the way you suspected. Most of the models were close to being perfectly determined.
```

Thanks a million for helping me to sought this problem out.

/Michael

13 dec 2010 kl. 14.47 skrev Maarten buis:

```
```--- On Mon, 13/12/10, Michael Wahman wrote:
```
```I am doing a study with two different logistic models,
where n is fairly small. In one of the models n is somewhat
bigger (n=107) and one model has a smalle r n (n=51). I want
to use robust bootstrapped standard errors to compensate for
the small n, especially in the second model. I've understood
that it is problematic to use MLE when the number of d.f. s
is small, since this model might not be asymptotic.

I have experimented with bootstraps, but the standard
errors in the model become huge. This seems to be associated
with the models with a small number of df.s. If I run the
models with a higher n with a bootstrap, I don’t get this
problem. Neither do I get it when excluding the control
variables.
```
```
Sounds to me like the model is close to being perfectly
determined. An example of perfect determination would be
when you have a continuous variable x and all observations
with values less than 2 on x are failures and all
observations with values more than 2 are successes. When
you get close to being perfectly determined small changes
in the data can lead to huge changes in the parameters,
which would yield the kind of behaviour you found when
using -bootstrap-.

First, I would use very few explanatory variables in that
dataset. The best case scenario would be when the
proportion of "successes" is about 50%. In that case the
variance of the dependent variable is maximum and the data
contains the most information. In that case I might use
4 explantory variables, maybe even 5. If the proportion of
successes is less than 30% or more than 70% I would use
1 maybe 2 explanatory variables.

Second, I would look at some cross tabulations of your
explantory variables against your dependent variable, to
see if you can find some problematic explanatory variables.

Third, I would add the -saving()- option in -bootstrap-.
This will save the estimates in each bootstrap sample.

Fourth, if the problem is close to perfect determination
then you might want to take a look at exact logistic
regression (-help exlogistic-).

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```