Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Tobias Pfaff" <tobias.pfaff@uni-muenster.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Bootstrapping & clustered standard errors (-xtreg-) |

Date |
Thu, 15 Sep 2011 16:54:07 +0200 |

Dear Cam, Thanks for the references! However, I think I will give up on bootstrap with panel data and clustered standard errors. It's too much a blackbox for me and maybe still an "embryonic research field" (http://economics.ca/2008/papers/0985.pdf). Apart from the previously described error with "insufficient observations", I also get a warning that collinearity of some of my dummies changes with bootstrap. And while having only 77,627 obs. in my sample, one bootstrap iteration shows 86,212 observations?? All things which I cannot understand easily. Anyway, concerning the violation of the assumption of normally distributed residuals, I found a nice paper (*, in German) and transformation of the dependent variable helps me to sufficiently attenuate the violation. Thanks again for all your efforts! Tobias (*) http://www.bwl.uni-kiel.de/bwlinstitute/grad-kolleg/new/typo3conf/ext/naw_se curedl/secure.php?u=0&file=/fileadmin/publications/pdf/pdf_03.gif&t=13161882 29&hash=7ffb63ce635a228ad74e460767ebc04d) -----Ursprüngliche Nachricht----- > Date: Mon, 12 Sep 2011 15:05:59 -0400 > Subject: RE: st: Bootstrapping & clustered standard errors (-xtreg-) > From: Cameron McIntosh <cnm100@hotmail.com> > To: STATA LIST <statalist@hsphsun2.harvard.edu> Hi Tobias, Ok, well your comments below remind me of: Wang, J., Carpenter, J.R., & Kepler, M.A. (2006). Using SAS to conduct nonparametric residual bootstrap multilevel modeling with a small number of groups. Computer Methods and Programs in Biomedicine, 82(2), 130-143. I don't know if Stata offers a similar procedure. In conjunction with the above paper, I also strongly recommend taking a look at: Maas, C.J.M., & Hox, J.J. (2004a). The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis, 46, 427?440.http://igitur-archive.library.uu.nl/fss/2007-1004-200713/Maas(2004)_ influence%20of%20violations.pdf Maas, C.J.M., & Hox, J.J. (2004b). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58, 127?137.http://joophox.net/publist/sn04.pdf Cam > From: tobias.pfaff@uni-muenster.de > To: statalist@hsphsun2.harvard.edu > Subject: RE: st: Bootstrapping & clustered standard errors (-xtreg-) > Date: Mon, 12 Sep 2011 17:51:48 +0200 > > Dear Stas, Bryan, > > I was maybe not clear why I want to bootstrap at all: > > My fixed effects regression with clustered SE works fine. > [-xtreg depvar indepvars, fe vce(cluster region) nonest dfadj-] > > However, my predicted residuals (-predict res_ue, ue-) are not normally > distributed. > Am I mistaken that I need normally distributed residuals for the > t-statistics to be unbiased? > > If I'm not mistaken then I would like to do a robustness check with > bootstrapped standard errors (where the normal distribution of residuals > doesn't matter for the z-statistics to be unbiased) to see if my results > change or not. > And I still get the error message of insufficient observations when trying > to bootstrap with clustered SE. Using -idcluster()- does not help. > I have 76,000 obs., 8100 individuals, 108 clusters, and 36 regressors. I > don't think that the bootstrap would produce a sample with fewer cluster > id's than regressors. > So I still don't know why I get the error message after -xtreg depvars > indepvars, fe vce(bootstrap, reps(3) seed(1)) cluster(region_svyyear) nonest > dfadj-? > > WEIGHTS: > Your arguments regarding the usage of weights were convincing. However, > -xtreg- only allows for weights that do not change for the individuals over > the years. Our panel dataset has a variable for the design weight that does > not change over the years, but this weight does not contain information on > non-response. Another weight variable in the dataset contains information on > selection probabilities and non-response, but it obviously changes over the > years for each individual, and cannot be used with -xtreg-. So I wouldn't > know how to incorporate information on non-response with -xtreg-? > > Earlier in this thread Cameron said that bootstrap only makes sense in my > case if I would use "custom bootstrap weights computed by a statistical > agency for a complex sampling frame". It seems that bootstrap cannot be used > with weights, anyway. I guess that weighted sampling is still not > implemented in bootstrap, as stated 8 years ago > (http://www.stata.com/statalist/archive/2003-09/msg00180.html). > > Thanks very much for your help, > Tobias > > P.S.: I cited the PNAS paper since it is a rare exception in my field > (happiness economics) that an empirical paper says something about > regression diagnostics at all. > > > -----Ursprüngliche Nachricht----- > > Date: Thu, 08 Sep 2011 17:20:35 -0400 > > Subject: Re: st: Bootstrapping & clustered standard errors (-xtreg-) > > From: Bryan Sayer <bsayer@chrr.osu.edu> > > To: statalist@hsphsun2.harvard.edu > > ... The > sampling weights control mostly for unequal probabilities of > selection, and for well-designed and well-conducted surveys, > non-response adjustments are not that large, while probabilities of > selection might differ quite notably. > > > I disagree with the part about non-response adjustments not being that > large. It really depends on the survey. Surveys in the U.S. may have > response rates as low as 25 to 30%, meaning that the non-response > adjustments may be pretty large. > > However, it is really the difference in response rates for different groups > that matters. For example a survey I am working with shows a noticeable > difference in response rates between the land-line phone and the cell phone > only group. > > The design effects for surveys can be broken into pieces for clustering, > stratification, and weighting. And weighting can be further classified into > the design weights and the non-response adjustments. If one really wanted to > pursue the matter. > > But more related to the point Stas is making, often the elements of the > survey design and weights that are incorporated into the survey will reflect > information that is not available to the user. Simple put, it may not be > possible to fully condition on the true sample design. This is because some > of the elements used in the sample design and weighting process cannot be > disclosed in public files for confidentiality reasons. > > Working in sampling, I am obviously biased toward using the weights. But > fundamentally, I believe that it is often impossible for the user to know > whether they have fully conditioned on the sample design or not. > > Most likely, lots of smart people worked hard on the sample design and > everything that goes into producing the data that you are using. Accept that > they (hopefully) did their job well. So if you have the sample design > information available to you, I don't see any reason to *not* use it. > > My impression is that bootstrapping of complex survey design data, while > possibly past its infancy, is probably still not very fully developed. I > know lots of very smart people who work on it, but it just does not seem to > generalize very well, at least not as well as a Taylor series linearzation. > > Just my 2 cents worth. > > Bryan Sayer > Monday to Friday, 8:30 to 5:00 > Phone: (614) 442-7369 > FAX: (614) 442-7329 > BSayer@chrr.osu.edu > > > On 9/8/2011 4:28 PM, Stas Kolenikov wrote: > > Tobias, > > I would say that you are worried about exactly the wrong things. The > sampling weights control mostly for unequal probabilities of > selection, and for well-designed and well-conducted surveys, > non-response adjustments are not that large, while probabilities of > selection might differ quite notably. While it is true that if you can > fully condition on the design variables and non-response propensity, > you can ignore the weights, I am yet to see an example where that > would happen. Believing that your model is perfect is... uhm... naive, > let's put it mildly; if anything, econometrics moves away from making > such strong assumptions as "my model is absolutely right" towards > robust methods of inference that would allow for some minor deviations > from the "absolutely right" scenario. There are no assumptions of > normality made anywhere in the process of calculating the standard > errors. All arguments are asymptotic, and you see z- rather than > t-statistics in the output. In fact, the arguments justifying the > bootstrap are asymptotic, as well. You can still entertain the > bootstrap idea, but basically the only way to check that you've done > it right is to compare the bootstrap standard errors with the > clustered standard errors. If they are about the same, any of them is > usable; if they are wildly different (say by more than 50%), I would > not either of them, but I would first check to see that the bootstrap > was done right. > > I know that PNAS is a huge impact factor journal in natural sciences, > but a statistics journal? or an econometrics journal? I mean, it's > cool to have a paper there on your resume, but I doubt many statalist > subscribers look at this journal for methodological insights (some > data miners or bioinformaticians or other statisticians on the margin > of computer science do publish in PNAS, though). I would not turn to > an essentially applied psychology paper for advice on clustered > standard errors. > > The error that you report probably comes from the bootstrap producing > a sample with fewer cluster identifiers than regressors in your model. > Normally, this would be rectified by specifying -idcluster()- option; > however in some odd cases, the bootstrap samples may still be > underidentified. I don't know whether the fixed effects regression > should be prone to such empirical underidentification. It might be, > given that not all of the parameters of an arbitrary model are > identified (the slopes of the time-invariant variables aren't). > > On Thu, Sep 8, 2011 at 3:30 AM, Tobias Pfaff > <tobias.pfaff@uni-muenster.de> wrote: > > Dear Stas, Cam, > > Thanks for your input! > > I want to bootstrap as a robustness check since my residuals of the > FE > regression are not normally distributed. > And bootstrapping as a robustness check because it does not assume > normality > of the residuals > (e.g., Headey et al. 2010, appendix p. 3, > http://www.pnas.org/content/107/42/17922.full.pdf?with-ds=yes). > > If I do bootstrapping with clustered standard errors as Jeff has > explained I > get the following error message: > > - insufficient observations > an error occurred when bootstrap executed xtreg, posting missing > values - > > Cam, you say that I would need custom bootstrap weights. My dataset > provides > individual weights with adjustments > for non-response etc. I do not use weights for the regression > because the > possible selection bias is mitigated due > to the fact that the variables which could cause the bias are > included as > control variables (e.g., income, employment > status). Thus, I would argue that my model is complete and the > unweighted > analysis leads to unbiased estimators. > > 1. Would you still include weights for the bootstrapping? > > 2. Does bootstrapping need more degrees of freedom than the normal > estimation of -xtreg- so that I get the above error message? > > 3. If bootstrapping is not a good idea in this case, what can I do > to > encounter the breach of the normality assumption of the residuals? > (I already checked transformation of the variables, but that doesn't > help) > > Regards, > Tobias > > > -----Ursprüngliche Nachricht----- > > Date: Wed, 7 Sep 2011 10:24:33 -0400 > Subject: RE: st: Bootstrapping& clustered standard errors > (-xtreg-) > From: Cameron McIntosh<cnm100@hotmail.com> > To: statalist@hsphsun2.harvard.edu > > Stas, Tobias > I agree with Stas that there is not much point in using the > bootstrap in > this case, unless you have custom bootstrap weights computed by a > statistical agency for a complex sampling frame, which would > incorporate > adjustments for non-response and calibration to known totals, etc. I > don't > think that is the case here, so I would go with the -cluster- SEs > too. > My two cents, > Cam > > > Date: Wed, 7 Sep 2011 09:03:27 -0500 > Subject: Re: st: Bootstrapping& clustered standard errors > (-xtreg-) > From: skolenik@gmail.com > To: statalist@hsphsun2.harvard.edu > > Tobias, > > can you please explain why you need the bootstrap at all? The > bootstrap standard errors are equivalent to the regular > -cluster- > standard errors asymptotically (in this case, with the number of > clusters going off to infinity), and, if anything, it is easier > to get > the bootstrap wrong than right with difficult problems. If > -cluster- > option works at all with -xtreg-, I see little reason to use the > bootstrap. (Very technically speaking, in my simulations, I've > seen > the bootstrap standard errors to be more stable than -robust- > standard > errors with large number of the bootstrap repetitions that have > to be > in an appropriate relations with the sample size; whether that > carries > over to the cluster standard errors, I don't know.) > > On Tue, Sep 6, 2011 at 12:25 PM, Tobias Pfaff > <tobias.pfaff@uni-muenster.de> wrote: > > Dear Statalisters, > > I do the following fixed effects regression: > > xtreg depvar indepvars, fe vce(cluster region) nonest dfadj > > Individuals in the panel are identified by the variable > "pid". The > time variable is "svyyear". Data were previously declared as > panel > data with -xtset pid svyyear-. > Since one of my independent variables is clustered at the > regional > level (not at the individual level), I use the option > -vce(cluster > > region)-. > > Now, I would like to do the same thing with bootstrapped > standard > > errors. > > I tried several commands, however, none of them works so > far. For > > example: > > xtreg depvar indepvars, fe vce(bootstrap, reps(3) seed(1) > > cluster(region)) > > nonest dfadj > .where I get the error message "option cluster() not > allowed". > > None of the hints in the manual (e.g., -idcluster()-, > -xtset, > clear-, > > -i()- > > in the main command) were helpful so far. > > How can I tell the bootstrapping command that the standard > errors > should > > be > > clustered at the regional level while using "pid" for panel > individuals? > > Any comments are appreciated! > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Change variable** - Next by Date:
**Re: st: Change variable** - Previous by thread:
**RE: st: Bootstrapping & clustered standard errors (-xtreg-)** - Next by thread:
**st: Use extended functions outside of macro assignment?** - Index(es):