Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Blown up Std. errors in logistic regression with bootstrap


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Blown up Std. errors in logistic regression with bootstrap
Date   Mon, 13 Dec 2010 13:47:27 +0000 (GMT)

--- On Mon, 13/12/10, Michael Wahman wrote:
> I am doing a study with two different logistic models,
> where n is fairly small. In one of the models n is somewhat
> bigger (n=107) and one model has a smalle r n (n=51). I want
> to use robust bootstrapped standard errors to compensate for
> the small n, especially in the second model. I've understood
> that it is problematic to use MLE when the number of d.f. s
> is small, since this model might not be asymptotic.
> 
> I have experimented with bootstraps, but the standard
> errors in the model become huge. This seems to be associated
> with the models with a small number of df.s. If I run the
> models with a higher n with a bootstrap, I don’t get this
> problem. Neither do I get it when excluding the control
> variables. 

Sounds to me like the model is close to being perfectly 
determined. An example of perfect determination would be
when you have a continuous variable x and all observations 
with values less than 2 on x are failures and all 
observations with values more than 2 are successes. When 
you get close to being perfectly determined small changes 
in the data can lead to huge changes in the parameters, 
which would yield the kind of behaviour you found when 
using -bootstrap-.

First, I would use very few explanatory variables in that
dataset. The best case scenario would be when the 
proportion of "successes" is about 50%. In that case the
variance of the dependent variable is maximum and the data
contains the most information. In that case I might use
4 explantory variables, maybe even 5. If the proportion of 
successes is less than 30% or more than 70% I would use
1 maybe 2 explanatory variables.

Second, I would look at some cross tabulations of your
explantory variables against your dependent variable, to 
see if you can find some problematic explanatory variables.

Third, I would add the -saving()- option in -bootstrap-.
This will save the estimates in each bootstrap sample.
Looking at these might help you identify the problem.

Fourth, if the problem is close to perfect determination
then you might want to take a look at exact logistic 
regression (-help exlogistic-).

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index