[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Rituals [was: Terminology for supposedly all-purpose summaries] |

Date |
Mon, 11 Oct 2004 10:02:50 -0400 |

At a recent meeting, somebody mentioned the rituals followed in vairous disciplines that use statistical methods. In his own one (business and organization research), the one he mentioned was, "If the sample size is smaller than 300, you cannot use structural equation models. If it is greater than that, you are good to go". Of course this shows lack of understanding of the field (that originated in another branch of social sciences, at any rate), as a structural equation model may have 6 parameters, or may have 60. Someone's initial reaction to this comment was that the social sciences in general are obsessed with goodness of fit measures. That's pretty much what you are talking about: the idea of summarizing the fit of a 60-parameter model with a dozen or two individual equations with one number. If that number is below 0.9, the fit is bad, and you should abandon the model. If the number is between 0.9 and 1.0, the fit is good. If the number is above 1.0, the model is overfitting. It is kind of neat that one can summarize the fit with one number... the bad news is that there are at least ten such measures developed for structural equation models. A ritual economists are attending to is testing for endogeneity, which usually boils down to testing for a difference in two model estimates (one is biased and the other is inefficient) and claiming that it comes from endogeneity, i.e., correlation between a regressor and an error term. A statistically significant answer means that the "weaker" model should be thrown away. This is so widespread that economists pretty much stopped using random effect models as long as Hausman tests usually says that there are statistically significant differences between random effects and fixed effects models, which means... see above. In Bayesian statistics, the ritual is to leave the first 1000 draws from a Monte Carlo Markov chains for the burn-in beriod. In the bootstrap world, the ritual is to draw 1000 (some smarter guys make it 999) bootstrap replicates. The R2 rituals are quite interesting. I don't know whether there are such rituals in hard sciences, like physics or chemistry, for fitting the regression models to the experimental data, but my guess would be that R2's should be quite high (above 0.9?). Most of social science researchers would be terrified to see an R2 below 0.5 (which means that their measure is not reliable enough), and most labor economists would be totally happy to see an R2 greater than 0.15... Are there any other interesting rituals you would want to share? Interestingly enough, I was not able to come up with consistent rituals followed by the mathematical statisticians. P.S., so that is not that much of an off-topic: My own Stata rituals are to start my do-files with clear set mem whatever I need cd wherever I need cap log close log using whatever I can find later etc. P.P.S. Nick, you forgot the magic word "test" in your right column. "The omnibus test" is what tend to see quite often. Stas On Mon, 11 Oct 2004 10:56:02 +0100, Nick Cox <n.j.cox@durham.ac.uk> wrote: > It seems difficult to resist the temptation to try > to summarise the performance of any model by one (or > a few) figures of merit. Like many others, I know > that any single measure can miss a lot that is > important, but I often succumb, especially when > the models are many and the space is short. > > I am aware of the following _general_ terminology that > people use to discuss attempts to pack all the > information into one number: > > factotum | index > omnibus | measure > portmanteau | statistic > > Any of the terms on the left can be combined > with any of the terms on the right. I guess that > the portmanteau terminology owes a lot to Lewis > Carroll's sense of that word. > > (Of course there are all sorts of _particular_ > measures, R^2, AIC, BIC, etc., etc., not my > concern here.) > > All these terms have been in the literature for > at least 50 years. Sometimes they are used positively > ("look, this test tests for everything at once") > and sometimes negatively ("yes indeed, what a bad > idea"). > > Can anything add to this list, especially any > colourful (but not offensive) used by > charismatic teachers, leaders in the field, > etc.? > > I am aware of the technical concept of _sufficiency_, > not the issue here as I see it. > > Nick > n.j.cox@durham.ac.uk -- Stas Kolenikov http://stas.kolenikov.name * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Terminology for supposedly all-purpose summaries***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: RE: RE: Data manipulation: how to keep only consecutive obs in an unbalanced panel** - Next by Date:
**Re: st: Question about reg3/ivreg--endogeneity issue** - Previous by thread:
**st: Terminology for supposedly all-purpose summaries** - Next by thread:
**st: Question about reg3/ivreg--endogeneity issue** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |