[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Stas Kolenikov" <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: bootstrap and XTIVREG2 |

Date |
Sat, 15 Sep 2007 21:46:07 -0500 |

On 9/15/07, Erasmo Giambona <e.giambona@gmail.com> wrote: > Thanks very much Stas. The problem is that the estimate goes from a > p-value of less than 0.01% to a p-value of 19% so I am in the dilemma > of trying to figure out which is most reliable. I would truly > appreciate a little bit more of your time. Below you suggest to look > at the confidence intervals. Are you suggesting to compare the > bootstrap intervals with the sandwich intervals? Would it make sense > to check what happens if I increase my repetitions from 1000 to say > 5000 given that I have more than 1600 clusters? > I would appreciate any further comments on this. I don't think increasing the number of subsamples from 1000 to 5000 would change things much, although of course you could try it. Frankly, I'd be at a loss... both methods are justifiable, and so if they diverge that much, I'd say that none of them is truly reliable. If anything, I would expect this sort of instability from a variable that does not change much within a cluster, although the sandwich estimate should catch that. As a theoretical possibility, there might be identification issues, so that some bootstrap samples hit empirically underidentified situation -- say all subsampled clusters have a value of the difficult variable equal to 1, while there are other clusters left out that have a value of 0, so everything is OK in the complete sample. Then the parameter is perfectly collinear with the constant for that bootstrap subsample, and thus underID. If that, or something like that, is plausible, then you can either catch that situation with -reject- option, or stratify your sample by that "slowly varying" variable. It would also be interesting to see how this stuff behaves if you subsample a small fraction of your clusters -- say 100 or 200 out of 1600. This would call for rescaling by the sqrt of the effective sample sizes, and I don't know if Stata does this by default. This trick is known to rectify a few difficult situations when you bootstrap a pivotal quantity (t-statistic rather than the coefficient estimate itself). BTW what are your clusters? I have not come across a situation with complex survey designs where that would be a reasonable outcome. You should cluster at the highest level possible, which might be regions rather than households if the first stage of sampling was at the level of that region. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: Please do not reply to my Gmail address as I don't check it regularly. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: bootstrap and XTIVREG2***From:*"Erasmo Giambona" <e.giambona@gmail.com>

**Re: st: bootstrap and XTIVREG2***From:*"Stas Kolenikov" <skolenik@gmail.com>

**Re: st: bootstrap and XTIVREG2***From:*"Erasmo Giambona" <e.giambona@gmail.com>

- Prev by Date:
**Re: st: question about esttab** - Next by Date:
**Re: st: Random parameters probit** - Previous by thread:
**Re: st: bootstrap and XTIVREG2** - Next by thread:
**st: xtlogit (version 9 and 10) and xtmelogit** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |