Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Tobias Pfaff" <tobias.pfaff@uni-muenster.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Bootstrapping & clustered standard errors (-xtreg-) |

Date |
Mon, 12 Sep 2011 17:51:48 +0200 |

Dear Stas, Bryan, I was maybe not clear why I want to bootstrap at all: My fixed effects regression with clustered SE works fine. [-xtreg depvar indepvars, fe vce(cluster region) nonest dfadj-] However, my predicted residuals (-predict res_ue, ue-) are not normally distributed. Am I mistaken that I need normally distributed residuals for the t-statistics to be unbiased? If I'm not mistaken then I would like to do a robustness check with bootstrapped standard errors (where the normal distribution of residuals doesn't matter for the z-statistics to be unbiased) to see if my results change or not. And I still get the error message of insufficient observations when trying to bootstrap with clustered SE. Using -idcluster()- does not help. I have 76,000 obs., 8100 individuals, 108 clusters, and 36 regressors. I don't think that the bootstrap would produce a sample with fewer cluster id's than regressors. So I still don't know why I get the error message after -xtreg depvars indepvars, fe vce(bootstrap, reps(3) seed(1)) cluster(region_svyyear) nonest dfadj-? WEIGHTS: Your arguments regarding the usage of weights were convincing. However, -xtreg- only allows for weights that do not change for the individuals over the years. Our panel dataset has a variable for the design weight that does not change over the years, but this weight does not contain information on non-response. Another weight variable in the dataset contains information on selection probabilities and non-response, but it obviously changes over the years for each individual, and cannot be used with -xtreg-. So I wouldn't know how to incorporate information on non-response with -xtreg-? Earlier in this thread Cameron said that bootstrap only makes sense in my case if I would use "custom bootstrap weights computed by a statistical agency for a complex sampling frame". It seems that bootstrap cannot be used with weights, anyway. I guess that weighted sampling is still not implemented in bootstrap, as stated 8 years ago (http://www.stata.com/statalist/archive/2003-09/msg00180.html). Thanks very much for your help, Tobias P.S.: I cited the PNAS paper since it is a rare exception in my field (happiness economics) that an empirical paper says something about regression diagnostics at all. -----Ursprüngliche Nachricht----- > Date: Thu, 08 Sep 2011 17:20:35 -0400 > Subject: Re: st: Bootstrapping & clustered standard errors (-xtreg-) > From: Bryan Sayer <bsayer@chrr.osu.edu> > To: statalist@hsphsun2.harvard.edu ... The sampling weights control mostly for unequal probabilities of selection, and for well-designed and well-conducted surveys, non-response adjustments are not that large, while probabilities of selection might differ quite notably. I disagree with the part about non-response adjustments not being that large. It really depends on the survey. Surveys in the U.S. may have response rates as low as 25 to 30%, meaning that the non-response adjustments may be pretty large. However, it is really the difference in response rates for different groups that matters. For example a survey I am working with shows a noticeable difference in response rates between the land-line phone and the cell phone only group. The design effects for surveys can be broken into pieces for clustering, stratification, and weighting. And weighting can be further classified into the design weights and the non-response adjustments. If one really wanted to pursue the matter. But more related to the point Stas is making, often the elements of the survey design and weights that are incorporated into the survey will reflect information that is not available to the user. Simple put, it may not be possible to fully condition on the true sample design. This is because some of the elements used in the sample design and weighting process cannot be disclosed in public files for confidentiality reasons. Working in sampling, I am obviously biased toward using the weights. But fundamentally, I believe that it is often impossible for the user to know whether they have fully conditioned on the sample design or not. Most likely, lots of smart people worked hard on the sample design and everything that goes into producing the data that you are using. Accept that they (hopefully) did their job well. So if you have the sample design information available to you, I don't see any reason to *not* use it. My impression is that bootstrapping of complex survey design data, while possibly past its infancy, is probably still not very fully developed. I know lots of very smart people who work on it, but it just does not seem to generalize very well, at least not as well as a Taylor series linearzation. Just my 2 cents worth. Bryan Sayer Monday to Friday, 8:30 to 5:00 Phone: (614) 442-7369 FAX: (614) 442-7329 BSayer@chrr.osu.edu On 9/8/2011 4:28 PM, Stas Kolenikov wrote: Tobias, I would say that you are worried about exactly the wrong things. The sampling weights control mostly for unequal probabilities of selection, and for well-designed and well-conducted surveys, non-response adjustments are not that large, while probabilities of selection might differ quite notably. While it is true that if you can fully condition on the design variables and non-response propensity, you can ignore the weights, I am yet to see an example where that would happen. Believing that your model is perfect is... uhm... naive, let's put it mildly; if anything, econometrics moves away from making such strong assumptions as "my model is absolutely right" towards robust methods of inference that would allow for some minor deviations from the "absolutely right" scenario. There are no assumptions of normality made anywhere in the process of calculating the standard errors. All arguments are asymptotic, and you see z- rather than t-statistics in the output. In fact, the arguments justifying the bootstrap are asymptotic, as well. You can still entertain the bootstrap idea, but basically the only way to check that you've done it right is to compare the bootstrap standard errors with the clustered standard errors. If they are about the same, any of them is usable; if they are wildly different (say by more than 50%), I would not either of them, but I would first check to see that the bootstrap was done right. I know that PNAS is a huge impact factor journal in natural sciences, but a statistics journal? or an econometrics journal? I mean, it's cool to have a paper there on your resume, but I doubt many statalist subscribers look at this journal for methodological insights (some data miners or bioinformaticians or other statisticians on the margin of computer science do publish in PNAS, though). I would not turn to an essentially applied psychology paper for advice on clustered standard errors. The error that you report probably comes from the bootstrap producing a sample with fewer cluster identifiers than regressors in your model. Normally, this would be rectified by specifying -idcluster()- option; however in some odd cases, the bootstrap samples may still be underidentified. I don't know whether the fixed effects regression should be prone to such empirical underidentification. It might be, given that not all of the parameters of an arbitrary model are identified (the slopes of the time-invariant variables aren't). On Thu, Sep 8, 2011 at 3:30 AM, Tobias Pfaff <tobias.pfaff@uni-muenster.de> wrote: Dear Stas, Cam, Thanks for your input! I want to bootstrap as a robustness check since my residuals of the FE regression are not normally distributed. And bootstrapping as a robustness check because it does not assume normality of the residuals (e.g., Headey et al. 2010, appendix p. 3, http://www.pnas.org/content/107/42/17922.full.pdf?with-ds=yes). If I do bootstrapping with clustered standard errors as Jeff has explained I get the following error message: - insufficient observations an error occurred when bootstrap executed xtreg, posting missing values - Cam, you say that I would need custom bootstrap weights. My dataset provides individual weights with adjustments for non-response etc. I do not use weights for the regression because the possible selection bias is mitigated due to the fact that the variables which could cause the bias are included as control variables (e.g., income, employment status). Thus, I would argue that my model is complete and the unweighted analysis leads to unbiased estimators. 1. Would you still include weights for the bootstrapping? 2. Does bootstrapping need more degrees of freedom than the normal estimation of -xtreg- so that I get the above error message? 3. If bootstrapping is not a good idea in this case, what can I do to encounter the breach of the normality assumption of the residuals? (I already checked transformation of the variables, but that doesn't help) Regards, Tobias -----Ursprüngliche Nachricht----- Date: Wed, 7 Sep 2011 10:24:33 -0400 Subject: RE: st: Bootstrapping& clustered standard errors (-xtreg-) From: Cameron McIntosh<cnm100@hotmail.com> To: statalist@hsphsun2.harvard.edu Stas, Tobias I agree with Stas that there is not much point in using the bootstrap in this case, unless you have custom bootstrap weights computed by a statistical agency for a complex sampling frame, which would incorporate adjustments for non-response and calibration to known totals, etc. I don't think that is the case here, so I would go with the -cluster- SEs too. My two cents, Cam Date: Wed, 7 Sep 2011 09:03:27 -0500 Subject: Re: st: Bootstrapping& clustered standard errors (-xtreg-) From: skolenik@gmail.com To: statalist@hsphsun2.harvard.edu Tobias, can you please explain why you need the bootstrap at all? The bootstrap standard errors are equivalent to the regular -cluster- standard errors asymptotically (in this case, with the number of clusters going off to infinity), and, if anything, it is easier to get the bootstrap wrong than right with difficult problems. If -cluster- option works at all with -xtreg-, I see little reason to use the bootstrap. (Very technically speaking, in my simulations, I've seen the bootstrap standard errors to be more stable than -robust- standard errors with large number of the bootstrap repetitions that have to be in an appropriate relations with the sample size; whether that carries over to the cluster standard errors, I don't know.) On Tue, Sep 6, 2011 at 12:25 PM, Tobias Pfaff <tobias.pfaff@uni-muenster.de> wrote: Dear Statalisters, I do the following fixed effects regression: xtreg depvar indepvars, fe vce(cluster region) nonest dfadj Individuals in the panel are identified by the variable "pid". The time variable is "svyyear". Data were previously declared as panel data with -xtset pid svyyear-. Since one of my independent variables is clustered at the regional level (not at the individual level), I use the option -vce(cluster region)-. Now, I would like to do the same thing with bootstrapped standard errors. I tried several commands, however, none of them works so far. For example: xtreg depvar indepvars, fe vce(bootstrap, reps(3) seed(1) cluster(region)) nonest dfadj .where I get the error message "option cluster() not allowed". None of the hints in the manual (e.g., -idcluster()-, -xtset, clear-, -i()- in the main command) were helpful so far. How can I tell the bootstrapping command that the standard errors should be clustered at the regional level while using "pid" for panel individuals? Any comments are appreciated! * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Bootstrapping & clustered standard errors (-xtreg-)***From:*Cameron McIntosh <cnm100@hotmail.com>

- Prev by Date:
**st: repeating first x values** - Next by Date:
**Re: st: repeating first x values** - Previous by thread:
**Re: st: Bootstrapping & clustered standard errors (-xtreg-)** - Next by thread:
**RE: st: Bootstrapping & clustered standard errors (-xtreg-)** - Index(es):