Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Rituals [was: Terminology for supposedly all-purpose summaries]

From   Stas Kolenikov <[email protected]>
To   [email protected]
Subject   st: Rituals [was: Terminology for supposedly all-purpose summaries]
Date   Mon, 11 Oct 2004 10:02:50 -0400

At a recent meeting, somebody mentioned the rituals followed in
vairous disciplines that use statistical methods. In his own one
(business and organization research), the one he mentioned was, "If
the sample size is smaller than 300, you cannot use structural
equation models. If it is greater than that, you are good to go". Of
course this shows lack of understanding of the field (that originated
in another branch of social sciences, at any rate), as a structural
equation model may have 6 parameters, or may have 60.

Someone's initial reaction to this comment was that the social
sciences in general are obsessed with goodness of fit measures. That's
pretty much what you are talking about: the idea of summarizing the
fit of a 60-parameter model with a dozen or two individual equations
with one number. If that number is below 0.9, the fit is bad, and you
should abandon the model. If the number is between 0.9 and 1.0, the
fit is good. If the number is above 1.0, the model is overfitting. It
is kind of neat that one can summarize the fit with one number... the
bad news is that there are at least ten such measures developed for
structural equation models.

A ritual economists are attending to is testing for endogeneity, which
usually boils down to testing for a difference in two model estimates
(one is biased and the other is inefficient) and claiming that it
comes from endogeneity, i.e., correlation between a regressor and an
error term. A statistically significant answer means that the "weaker"
model should be thrown away. This is so widespread that economists
pretty much stopped using random effect models as long as Hausman
tests usually says that there are statistically significant
differences between random effects and fixed effects models, which
means... see above.

In Bayesian statistics, the ritual is to leave the first 1000 draws
from a Monte Carlo Markov chains for the burn-in beriod. In the
bootstrap world, the ritual is to draw 1000 (some smarter guys make it
999) bootstrap replicates.

The R2 rituals are quite interesting. I don't know whether there are
such rituals in hard sciences, like physics or chemistry, for fitting
the regression models to the experimental data, but my guess would be
that R2's should be quite high (above 0.9?). Most of social science
researchers would be terrified to see an R2 below 0.5 (which means
that their measure is not reliable enough), and most labor economists
would be totally happy to see an R2 greater than 0.15...

Are there any other interesting rituals you would want to share?
Interestingly enough, I was not able to come up with consistent
rituals followed by the mathematical statisticians.

P.S., so that is not that much of an off-topic: My own Stata rituals
are to start my do-files with

set mem whatever I need
cd wherever I need
cap log close
log using whatever I can find later


P.P.S. Nick, you forgot the magic word "test" in your right column.
"The omnibus test" is what tend to see quite often.


On Mon, 11 Oct 2004 10:56:02 +0100, Nick Cox <[email protected]> wrote:
> It seems difficult to resist the temptation to try
> to summarise the performance of any model by one (or
> a few) figures of merit. Like many others, I know
> that any single measure can miss a lot that is
> important, but I often succumb, especially when
> the models are many and the space is short.
> I am aware of the following _general_ terminology that
> people use to discuss attempts to pack all the
> information into one number:
> factotum     |  index
> omnibus      |  measure
> portmanteau  |  statistic
> Any of the terms on the left can be combined
> with any of the terms on the right. I guess that
> the portmanteau terminology owes a lot to Lewis
> Carroll's sense of that word.
> (Of course there are all sorts of _particular_
> measures, R^2, AIC, BIC, etc., etc., not my
> concern here.)
> All these terms have been in the literature for
> at least 50 years. Sometimes they are used positively
> ("look, this test tests for everything at once")
> and sometimes negatively ("yes indeed, what a bad
> idea").
> Can anything add to this list, especially any
> colourful (but not offensive) used by
> charismatic teachers, leaders in the field,
> etc.?
> I am aware of the technical concept of _sufficiency_,
> not the issue here as I see it.
> Nick
> [email protected]

Stas Kolenikov
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index