Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Bootstrapping & clustered standard errors (-xtreg-)

From	Cameron McIntosh <[email protected]>
To	STATA LIST <[email protected]>
Subject	RE: st: Bootstrapping & clustered standard errors (-xtreg-)
Date	Mon, 12 Sep 2011 15:05:59 -0400
Hi Tobias,
Ok, well your comments below remind me of:
Wang, J., Carpenter, J.R., & Kepler, M.A. (2006). Using SAS to conduct nonparametric residual bootstrap multilevel modeling with a small number of groups. Computer Methods and Programs in Biomedicine, 82(2), 130-143.
I don't know if Stata offers a similar procedure. In conjunction with the above paper, I also strongly recommend taking a look at:
Maas, C.J.M., & Hox, J.J. (2004a). The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis, 46, 427–440.http://igitur-archive.library.uu.nl/fss/2007-1004-200713/Maas(2004)_influence%20of%20violations.pdf
Maas, C.J.M., & Hox, J.J. (2004b). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58, 127–137.http://joophox.net/publist/sn04.pdf
Cam
> From: [email protected]
> To: [email protected]
> Subject: RE: st: Bootstrapping & clustered standard errors (-xtreg-)
> Date: Mon, 12 Sep 2011 17:51:48 +0200
> 
> Dear Stas, Bryan,
> 
> I was maybe not clear why I want to bootstrap at all:
> 
> My fixed effects regression with clustered SE works fine.
> [-xtreg depvar indepvars, fe vce(cluster region) nonest dfadj-]
> 
> However, my predicted residuals (-predict res_ue, ue-) are not normally
> distributed.
> Am I mistaken that I need normally distributed residuals for the
> t-statistics to be unbiased?
> 
> If I'm not mistaken then I would like to do a robustness check with
> bootstrapped standard errors (where the normal distribution of residuals
> doesn't matter for the z-statistics to be unbiased) to see if my results
> change or not.
> And I still get the error message of insufficient observations when trying
> to bootstrap with clustered SE. Using -idcluster()- does not help.
> I have 76,000 obs., 8100 individuals, 108 clusters, and 36 regressors. I
> don't think that the bootstrap would produce a sample with fewer cluster
> id's than regressors.
> So I still don't know why I get the error message after -xtreg depvars
> indepvars, fe vce(bootstrap, reps(3) seed(1)) cluster(region_svyyear) nonest
> dfadj-?
> 
> WEIGHTS:
> Your arguments regarding the usage of weights were convincing. However,
> -xtreg- only allows for weights that do not change for the individuals over
> the years. Our panel dataset has a variable for the design weight that does
> not change over the years, but this weight does not contain information on
> non-response. Another weight variable in the dataset contains information on
> selection probabilities and non-response, but it obviously changes over the
> years for each individual, and cannot be used with -xtreg-. So I wouldn't
> know how to incorporate information on non-response with -xtreg-?
> 
> Earlier in this thread Cameron said that bootstrap only makes sense in my
> case if I would use "custom bootstrap weights computed by a statistical
> agency for a complex sampling frame". It seems that bootstrap cannot be used
> with weights, anyway. I guess that weighted sampling is still not
> implemented in bootstrap, as stated 8 years ago
> (http://www.stata.com/statalist/archive/2003-09/msg00180.html).
> 
> Thanks very much for your help,
> Tobias
> 
> P.S.: I cited the PNAS paper since it is a rare exception in my field
> (happiness economics) that an empirical paper says something about
> regression diagnostics at all.
> 
> 
> -----Ursprüngliche Nachricht-----
> > Date: Thu, 08 Sep 2011 17:20:35 -0400
> > Subject: Re: st: Bootstrapping & clustered standard errors (-xtreg-)
> > From: Bryan Sayer <[email protected]>
> > To: [email protected]
> 
>         ... The
>         sampling weights control mostly for unequal probabilities of
>         selection, and for well-designed and well-conducted surveys,
>         non-response adjustments are not that large, while probabilities of
>         selection might differ quite notably.
> 
> 
> I disagree with the part about non-response adjustments not being that
> large. It really depends on the survey. Surveys in the U.S. may have
> response rates as low as 25 to 30%, meaning that the non-response
> adjustments may be pretty large.
> 
> However, it is really the difference in response rates for different groups
> that matters. For example a survey I am working with shows a noticeable
> difference in response rates between the land-line phone and the cell phone
> only group.
> 
> The design effects for surveys can be broken into pieces for clustering,
> stratification, and weighting. And weighting can be further classified into
> the design weights and the non-response adjustments. If one really wanted to
> pursue the matter.
> 
> But more related to the point Stas is making, often the elements of the
> survey design and weights that are incorporated into the survey will reflect
> information that is not available to the user. Simple put, it may not be
> possible to fully condition on the true sample design. This is because some
> of the elements used in the sample design and weighting process cannot be
> disclosed in public files for confidentiality reasons.
> 
> Working in sampling, I am obviously biased toward using the weights. But
> fundamentally, I believe that it is often impossible for the user to know
> whether they have fully conditioned on the sample design or not.
> 
> Most likely, lots of smart people worked hard on the sample design and
> everything that goes into producing the data that you are using. Accept that
> they (hopefully) did their job well. So if you have the sample design
> information available to you, I don't see any reason to *not* use it.
> 
> My impression is that bootstrapping of complex survey design data, while
> possibly past its infancy, is probably still not very fully developed. I
> know lots of very smart people who work on it, but it just does not seem to
> generalize very well, at least not as well as a Taylor series linearzation.
> 
> Just my 2 cents worth.
> 
> Bryan Sayer
> Monday to Friday, 8:30 to 5:00
> Phone: (614) 442-7369
> FAX:  (614) 442-7329
> [email protected]
> 
> 
> On 9/8/2011 4:28 PM, Stas Kolenikov wrote:
> 
>     Tobias,
> 
>     I would say that you are worried about exactly the wrong things. The
>     sampling weights control mostly for unequal probabilities of
>     selection, and for well-designed and well-conducted surveys,
>     non-response adjustments are not that large, while probabilities of
>     selection might differ quite notably. While it is true that if you can
>     fully condition on the design variables and non-response propensity,
>     you can ignore the weights, I am yet to see an example where that
>     would happen. Believing that your model is perfect is... uhm... naive,
>     let's put it mildly; if anything, econometrics moves away from making
>     such strong assumptions as "my model is absolutely right" towards
>     robust methods of inference that would allow for some minor deviations
>     from the "absolutely right" scenario. There are no assumptions of
>     normality made anywhere in the process of calculating the standard
>     errors. All arguments are asymptotic, and you see z- rather than
>     t-statistics in the output. In fact, the arguments justifying the
>     bootstrap are asymptotic, as well. You can still entertain the
>     bootstrap idea, but basically the only way to check that you've done
>     it right is to compare the bootstrap standard errors with the
>     clustered standard errors. If they are about the same, any of them is
>     usable; if they are wildly different (say by more than 50%), I would
>     not either of them, but I would first check to see that the bootstrap
>     was done right.
> 
>     I know that PNAS is a huge impact factor journal in natural sciences,
>     but a statistics journal? or an econometrics journal? I mean, it's
>     cool to have a paper there on your resume, but I doubt many statalist
>     subscribers look at this journal for methodological insights (some
>     data miners or bioinformaticians or other statisticians on the margin
>     of computer science do publish in PNAS, though). I would not turn to
>     an essentially applied psychology paper for advice on clustered
>     standard errors.
> 
>     The error that you report probably comes from the bootstrap producing
>     a sample with fewer cluster identifiers than regressors in your model.
>     Normally, this would be rectified by specifying -idcluster()- option;
>     however in some odd cases, the bootstrap samples may still be
>     underidentified. I don't know whether the fixed effects regression
>     should be prone to such empirical underidentification. It might be,
>     given that not all of the parameters of an arbitrary model are
>     identified (the slopes of the time-invariant variables aren't).
> 
>     On Thu, Sep 8, 2011 at 3:30 AM, Tobias Pfaff
>     <[email protected]>  wrote:
> 
>         Dear Stas, Cam,
> 
>         Thanks for your input!
> 
>         I want to bootstrap as a robustness check since my residuals of the
> FE
>         regression are not normally distributed.
>         And bootstrapping as a robustness check because it does not assume
> normality
>         of the residuals
>         (e.g., Headey et al. 2010, appendix p. 3,
>         http://www.pnas.org/content/107/42/17922.full.pdf?with-ds=yes).
> 
>         If I do bootstrapping with clustered standard errors as Jeff has
> explained I
>         get the following error message:
> 
>         - insufficient observations
>         an error occurred when bootstrap executed xtreg, posting missing
> values -
> 
>         Cam, you say that I would need custom bootstrap weights. My dataset
> provides
>         individual weights with adjustments
>         for non-response etc. I do not use weights for the regression
> because the
>         possible selection bias is mitigated due
>         to the fact that the variables which could cause the bias are
> included as
>         control variables (e.g., income, employment
>         status). Thus, I would argue that my model is complete and the
> unweighted
>         analysis leads to unbiased estimators.
> 
>         1. Would you still include weights for the bootstrapping?
> 
>         2. Does bootstrapping need more degrees of freedom than the normal
>         estimation of -xtreg- so that I get the above error message?
> 
>         3. If bootstrapping is not a good idea in this case, what can I do
> to
>         encounter the breach of the normality assumption of the residuals?
>         (I already checked transformation of the variables, but that doesn't
> help)
> 
>         Regards,
>         Tobias
> 
> 
>         -----Ursprüngliche Nachricht-----
> 
>             Date: Wed, 7 Sep 2011 10:24:33 -0400
>             Subject: RE: st: Bootstrapping&  clustered standard errors
> (-xtreg-)
>             From: Cameron McIntosh<[email protected]>
>             To: [email protected]
> 
>         Stas, Tobias
>         I agree with Stas that there is not much point in using the
> bootstrap in
>         this case, unless you have custom bootstrap weights computed by a
>         statistical agency for a complex sampling frame, which would
> incorporate
>         adjustments for non-response and calibration to known totals, etc. I
> don't
>         think that is the case here, so I would go with the -cluster- SEs
> too.
>         My two cents,
>         Cam
> 
> 
>             Date: Wed, 7 Sep 2011 09:03:27 -0500
>             Subject: Re: st: Bootstrapping&  clustered standard errors
> (-xtreg-)
>             From: [email protected]
>             To: [email protected]
> 
>             Tobias,
> 
>             can you please explain why you need the bootstrap at all? The
>             bootstrap standard errors are equivalent to the regular
> -cluster-
>             standard errors asymptotically (in this case, with the number of
>             clusters going off to infinity), and, if anything, it is easier
> to get
>             the bootstrap wrong than right with difficult problems. If
> -cluster-
>             option works at all with -xtreg-, I see little reason to use the
>             bootstrap. (Very technically speaking, in my simulations, I've
> seen
>             the bootstrap standard errors to be more stable than -robust-
> standard
>             errors with large number of the bootstrap repetitions that have
> to be
>             in an appropriate relations with the sample size; whether that
> carries
>             over to the cluster standard errors, I don't know.)
> 
>             On Tue, Sep 6, 2011 at 12:25 PM, Tobias Pfaff
>             <[email protected]>  wrote:
> 
>                 Dear Statalisters,
> 
>                 I do the following fixed effects regression:
> 
>                 xtreg depvar indepvars, fe vce(cluster region) nonest dfadj
> 
>                 Individuals in the panel are identified by the variable
> "pid". The
>                 time variable is "svyyear". Data were previously declared as
> panel
>                 data with -xtset pid svyyear-.
>                 Since one of my independent variables is clustered at the
> regional
>                 level (not at the individual level), I use the option
> -vce(cluster
> 
>         region)-.
> 
>                 Now, I would like to do the same thing with bootstrapped
> standard
> 
>         errors.
> 
>                 I tried several commands, however, none of them works so
> far. For
> 
>         example:
> 
>                 xtreg depvar indepvars, fe vce(bootstrap, reps(3) seed(1)
> 
>         cluster(region))
> 
>                 nonest dfadj
>                 .where I get the error message "option cluster() not
> allowed".
> 
>                 None of the hints in the manual (e.g., -idcluster()-,
> -xtset,
>                 clear-,
> 
>         -i()-
> 
>                 in the main command) were helpful so far.
> 
>                 How can I tell the bootstrapping command that the standard
> errors
>                 should
> 
>         be
> 
>                 clustered at the regional level while using "pid" for panel
> individuals?
> 
>                 Any comments are appreciated!
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- RE: st: Bootstrapping & clustered standard errors (-xtreg-)
  - From: "Tobias Pfaff" <[email protected]>
Prev by Date: Re: st: repeating first x values
Next by Date: st: 3 simultaneous equations
Previous by thread: RE: st: Bootstrapping & clustered standard errors (-xtreg-)
Next by thread: RE: st: Bootstrapping & clustered standard errors (-xtreg-)
Index(es):
- Date
- Thread