Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Blown up Std. errors in logistic regression with bootstrap


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Blown up Std. errors in logistic regression with bootstrap
Date   Mon, 13 Dec 2010 08:12:24 -0500

Oh my. Here are the few problems you probably have.

1. Rare outcomes: if you only have a dozen or so positive outcomes,
and you run a regression with 5 or more variables, you essentially
have (# of positive outcomes)/(# of regression) ratio, and your
results are unstable to begin with. Now when you bootstrap your data,
you may get some bootstrap samples where there are fewer than (# of
variables) positive outcomes, in which case the model is not
identified. If you have just one extra observation with a positive
outcome, you can imagine that the result will be highly unstable. I am
guessing that your model failed to converge more than once even though
Stata showed that as a dot to you (and it showed you an "x" when it
failed to converge or was not identified).

2. Your first model does not have clusters. And if you do have
clusters, your problem is exacerbated even further: you now have total
d.f. = # of clusters, and if you have ten, forget about asymptotics.
Besides, you need to specify -idcluster()- option, and bootstrap the
logit model with the clusters generated by -idcluster()-.

3. By the way, the bootstrap IS AN ASYMPTOTIC METHOD. Thinking that it
can fix all small sample problems is very misleading. The
justifications of the bootstrap are all based on theory of weak
convergence, and while it is true that the bootstrap provides some
improvements in CI coverage in small samples, you do not seem to be
utilizing these in your analysis -- you only do the standard errors
which are asymptotically equivalent to -robust- or -cluster()- ones
(provided you don't screw up the latter, which is quite easy to do).

You might be able to learn a bit more about what's going on with
-noisily- option of the bootstrap that returns the output from the
individual bootstrap sample runs.

On Mon, Dec 13, 2010 at 7:24 AM, Michael Wahman
<Michael.Wahman@svet.lu.se> wrote:
> Dear Statalist subscribers,
>
>
> I have run in to a major problem when trying to run a robustness check on
> one of my logistic regression models, using bootstrapped robust standard
> errors.
>
> I am doing a study with two different logistic models, where n is fairly
> small. In one of the models n is somewhat bigger (n=107) and one model has a
> smalle r n (n=51). I want to use robust bootstrapped standard errors to
> compensate for the small n, especially in the second model. I've understood
> that it is problematic to use MLE when the number of d.f. s is small, since
> this model might not be asymptotic.
>
> I have experimented with bootstraps, but the standard errors in the model
> become huge. This seems to be associated with the models with a small number
> of df.s. If I run the models with a higher n with a bootstrap, I don’t get
> this problem. Neither do I get it when excluding the control variables. I
> have also tried to use the jackknife command. This is marginally better, but
> still all the variables become insignificant.
>
> Below you can see the main model
>
>
>
> . bootstrap: logit  oppcoaltype124 stunipolardistance lgdpgro
> ldeltaifhpol lifhpol llndistmag orpreselec parelection dif12party
> (running logit on estimation sample)
>
> Bootstrap replications (50)
> ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
> .............x....................................    50
>
> Logistic regression                             Number of obs
> =        51
>                                                Replications
> =        49
>                                                Wald chi2(8)
> =      0.00
>                                                Prob > chi2
> =    1.0000
> Log likelihood = -10.060672                     Pseudo R2
> =    0.6217
>
> ------------------------------------------------------------------------------
>             |   Observed   Bootstrap                         Normal-
> based
> oppcoalt~124 |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
> Interval]
> -------------
> +----------------------------------------------------------------
> stunipolar~e |   7.811848   5599.462     0.00   0.999    -10966.93
> 10982.56
>     lgdpgro |   -.883474   678.2268    -0.00   0.999    -1330.183
> 1328.417
> ldeltaifhpol |   4.764161   4410.229     0.00   0.999    -8639.125
> 8648.654
>     lifhpol |   .8676982   764.7985     0.00   0.999     -1498.11
> 1499.845
>  llndistmag |   .1585821   325.1274     0.00   1.000    -637.0795
> 637.3966
>  orpreselec |  -1.598148   915.6822    -0.00   0.999    -1796.302
> 1793.106
>  parelection |  -2.872295   2864.866    -0.00   0.999    -5617.907
> 5612.162
>  dif12party |   .0150251   51.14297     0.00   1.000    -100.2234
> 100.2534
>       _cons |   -8.61135   6590.954    -0.00   0.999    -12926.64
> 12909.42
> ------------------------------------------------------------------------------
> Note: one or more parameters could not be estimated in 1 bootstrap
> replicate;
>      standard-error estimates include only complete replications.
>
>
>
> Another problem is that I do not succede to use the cluster option. I have
> tried two different commands
>
>
>
> 1) bootstrap, cluster(siffra):logit oppcoaltype124 stunipolardistance
>  lgdpgro ldeltaifhpol lifhpol llndistmag orpreselec parelection  dif12party
>
> Receives the answer:
>
> repeated time values within panel
> the most likely cause for this error is misspecifying the cluster(),
> idcluster(), or group() option
>
> I am sure I do not have repeated time values. If I run the tsset command,
> there is no problem. There might be some misspecification of the command,
> but I don't understand what that might be.
>
>
>
> 2) logit oppcoaltype124 stunipolardistance lgdpgro ldeltaifhpol lifhpol
>
> llndistmag orpreselec parelection dif12party, vce(boot) cluster(country)
>
> Receives the answer:
>
> no observations
>
>
> Do anyone get what the problem might be?
>
>
> I would just be enormously thankful if anyone could help me with this. I
> suspect that something is wrong here given that the standard errors increase
> so drastically.


-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index