Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Blown up Std. errors in logistic regression with bootstrap |

Date |
Mon, 13 Dec 2010 08:12:24 -0500 |

Oh my. Here are the few problems you probably have. 1. Rare outcomes: if you only have a dozen or so positive outcomes, and you run a regression with 5 or more variables, you essentially have (# of positive outcomes)/(# of regression) ratio, and your results are unstable to begin with. Now when you bootstrap your data, you may get some bootstrap samples where there are fewer than (# of variables) positive outcomes, in which case the model is not identified. If you have just one extra observation with a positive outcome, you can imagine that the result will be highly unstable. I am guessing that your model failed to converge more than once even though Stata showed that as a dot to you (and it showed you an "x" when it failed to converge or was not identified). 2. Your first model does not have clusters. And if you do have clusters, your problem is exacerbated even further: you now have total d.f. = # of clusters, and if you have ten, forget about asymptotics. Besides, you need to specify -idcluster()- option, and bootstrap the logit model with the clusters generated by -idcluster()-. 3. By the way, the bootstrap IS AN ASYMPTOTIC METHOD. Thinking that it can fix all small sample problems is very misleading. The justifications of the bootstrap are all based on theory of weak convergence, and while it is true that the bootstrap provides some improvements in CI coverage in small samples, you do not seem to be utilizing these in your analysis -- you only do the standard errors which are asymptotically equivalent to -robust- or -cluster()- ones (provided you don't screw up the latter, which is quite easy to do). You might be able to learn a bit more about what's going on with -noisily- option of the bootstrap that returns the output from the individual bootstrap sample runs. On Mon, Dec 13, 2010 at 7:24 AM, Michael Wahman <Michael.Wahman@svet.lu.se> wrote: > Dear Statalist subscribers, > > > I have run in to a major problem when trying to run a robustness check on > one of my logistic regression models, using bootstrapped robust standard > errors. > > I am doing a study with two different logistic models, where n is fairly > small. In one of the models n is somewhat bigger (n=107) and one model has a > smalle r n (n=51). I want to use robust bootstrapped standard errors to > compensate for the small n, especially in the second model. I've understood > that it is problematic to use MLE when the number of d.f. s is small, since > this model might not be asymptotic. > > I have experimented with bootstraps, but the standard errors in the model > become huge. This seems to be associated with the models with a small number > of df.s. If I run the models with a higher n with a bootstrap, I don’t get > this problem. Neither do I get it when excluding the control variables. I > have also tried to use the jackknife command. This is marginally better, but > still all the variables become insignificant. > > Below you can see the main model > > > > . bootstrap: logit oppcoaltype124 stunipolardistance lgdpgro > ldeltaifhpol lifhpol llndistmag orpreselec parelection dif12party > (running logit on estimation sample) > > Bootstrap replications (50) > ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 > .............x.................................... 50 > > Logistic regression Number of obs > = 51 > Replications > = 49 > Wald chi2(8) > = 0.00 > Prob > chi2 > = 1.0000 > Log likelihood = -10.060672 Pseudo R2 > = 0.6217 > > ------------------------------------------------------------------------------ > | Observed Bootstrap Normal- > based > oppcoalt~124 | Coef. Std. Err. z P>|z| [95% Conf. > Interval] > ------------- > +---------------------------------------------------------------- > stunipolar~e | 7.811848 5599.462 0.00 0.999 -10966.93 > 10982.56 > lgdpgro | -.883474 678.2268 -0.00 0.999 -1330.183 > 1328.417 > ldeltaifhpol | 4.764161 4410.229 0.00 0.999 -8639.125 > 8648.654 > lifhpol | .8676982 764.7985 0.00 0.999 -1498.11 > 1499.845 > llndistmag | .1585821 325.1274 0.00 1.000 -637.0795 > 637.3966 > orpreselec | -1.598148 915.6822 -0.00 0.999 -1796.302 > 1793.106 > parelection | -2.872295 2864.866 -0.00 0.999 -5617.907 > 5612.162 > dif12party | .0150251 51.14297 0.00 1.000 -100.2234 > 100.2534 > _cons | -8.61135 6590.954 -0.00 0.999 -12926.64 > 12909.42 > ------------------------------------------------------------------------------ > Note: one or more parameters could not be estimated in 1 bootstrap > replicate; > standard-error estimates include only complete replications. > > > > Another problem is that I do not succede to use the cluster option. I have > tried two different commands > > > > 1) bootstrap, cluster(siffra):logit oppcoaltype124 stunipolardistance > lgdpgro ldeltaifhpol lifhpol llndistmag orpreselec parelection dif12party > > Receives the answer: > > repeated time values within panel > the most likely cause for this error is misspecifying the cluster(), > idcluster(), or group() option > > I am sure I do not have repeated time values. If I run the tsset command, > there is no problem. There might be some misspecification of the command, > but I don't understand what that might be. > > > > 2) logit oppcoaltype124 stunipolardistance lgdpgro ldeltaifhpol lifhpol > > llndistmag orpreselec parelection dif12party, vce(boot) cluster(country) > > Receives the answer: > > no observations > > > Do anyone get what the problem might be? > > > I would just be enormously thankful if anyone could help me with this. I > suspect that something is wrong here given that the standard errors increase > so drastically. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Blown up Std. errors in logistic regression with bootstrap***From:*Michael Wahman <Michael.Wahman@svet.lu.se>

- Prev by Date:
**st: stata code for CIC model by Athey and Imbens** - Next by Date:
**Re: st: regress with vce(robust) and hascons** - Previous by thread:
**st: Blown up Std. errors in logistic regression with bootstrap** - Next by thread:
**Re: st: Blown up Std. errors in logistic regression with bootstrap** - Index(es):