[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: xtmixed with vce robust or cluster robust |

Date |
Tue, 1 Sep 2009 11:01:35 -0500 |

On Mon, Aug 31, 2009 at 4:34 PM, Schaffer, Mark E<M.E.Schaffer@hw.ac.uk> wrote: >> Now, the meaning of -robust- standard errors after -xtmixed- >> might be a somewhat of a mystery. > Is this analogous to the use of -robust- in a probit estimation? I > remember reading a discussion by Dixit somewhere (I think it was in the > book he did for the World Bank about 10 years ago) about how allowing > for heteroskedasticity in a probit model makes no sense, because if the > variance isn't constant, a probit is not estimating anything > consistently (I think this is what the argument was ... several layers > of brain dust are getting in the way). Yes, that's probably a similar interpretation problem. I remember that Sophia and Anders were also dismissive of heteroskedasticity in probit models (Rabe-Hesketh and Skrondal, 2004, private communication :)). > But estimating a probit model > with cluster-robust *does* make sense, because within-group correlation > or other failures of independence doesn't imply a probit is useless, > does mess up the usual classical SEs, and doesn't mess up cluster-robust > SEs (with enough assumptions etc. etc.) With binary dependent variable models, there are several interpretations. One is the econometric one: there is an underlying utility u=xb+e, and depending on the shape of the error term, you could have probit, logit, cloglog. And if we have several utilities, and e's come from a convenient extreme value distribution, you have -mlogit-. Hence you can talk about variance of that error term e, and that your identified combination is b/sigma, etc. Another intpreretation coming from statistics is that you are just modeling the probability of a 0/1 event. There are no variances involved there, although you could play with the functional form a little bit to improve fit. More distant fields like machine learning would probably say, heck with the likelihood and iterative maximization, let's fit a support vector machine model to this (and not worry about standard errors at all). If you are thinking in econometric terms, and are willing to assume your errors are correlated, then -cluster- correction does make sense. Although if correlations are sort of constant across clusters, you can build a more efficient estimator, I imagine -- similar argument as in your paper regarding heteroskedastic GMM for linear models. In general, the correlations of the error terms will be blurred by the link function and its derivatives that enter the estimating equations. -xtlogit- or -xtmelogit- probably takes a better account of this effect than -logit, cluster-. I would tend to think that once we get down to estimating equations, -cluster- corrections for -probit/logit- would be making sense in statistical interpretation of those models, too. Although in most cases statisticians would prefer to model that correlation directly with GEE or something of the kind. You see, statisticians are less concerned with endogeneity than econometricians are, and would prefer -xtreg, re- over -xtreg, fe- almost any day for efficiency reasons. In the end, most of -xt*- applications in statistics are some sort of clinical trials where endogeneity issues are dealt with by randomization procedures where applicable. Although here I am stepping into the biostat land I am less familiar with. >> With -regress-, the >> -robust- option is correcting for heteroskedasticity: you >> believe you modeled the first moments right, but not sure >> about higher order moments (the second moments, in this >> case). That's what Mark said: the model is bad, but not as >> bad as to kill the point estimates. If you have >> heteroskedasticity, your -xtmixed- model is likely wrong in >> its variance part, and the variance parameters may not >> necessarily correspond to well-defined population parameters. >> If so, what does the inference on these point estimates do? > if I have, say, within-group correlation that the -xtmixed- model > doesn't model properly, does cluster-robust help? For example, say my > -xtmixed- model is a lot better than nothing (in efficiency terms) but > there is still within-group dependence that is not properly modelled, > and I suspect this. Would this be a reasonable rationale to want to use > cluster-robust? I would say it would be making good sense if -cluster- is a level higher than the highest level modeled by -xtmixed-. Say you have students nested in classrooms nested in schools, and the latter are sampled by county. It might make sense substantively to build a model around students, classrooms and schools, but counties don't belong in that substantive model. You might still want to correct for the PSU in the sampling procedure with -cluster- option, then. Whether correcting for clusters at the same level as you do modeling is making sense, I do not know. Let's think about an analogy with linear regression again. Suppose you model heteroskedasticity in your regression with a hypothetical -hetreg- command (is there really a command to do it, BTW? I remember examples in [ML] book that show how to build such a model, but I don't really know if there is a stand-alone command that does it). Would you want to add -hetreg , robust- to it, then, saying something along the lines of "I don't really know if my heteroskedasticity model is any good" ? I would probably have some reservations about it, and would at least try some simulations to see how badly the model performs with and without the -robust- option. I would imagine that with a misspecified variance model, the performance in confidence intervals would be -regress, robust- better than -hetreg, robust- better than -hetreg- better than -regress-, although I won't bet much in this race. Disclaimer: again, all said is my thinking aloud. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: correlated data** - Next by Date:
**st: Graph Box Color** - Previous by thread:
**Re: st: How to choose consistent starting values** - Next by thread:
**st: Graph Box Color** - Index(es):

© Copyright 1996–2023 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |