Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: hireg help


From   Richard Williams <Richard.A.Williams.5@ND.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: hireg help
Date   Fri, 18 May 2007 16:33:20 -0500

At 11:16 AM 5/18/2007, Austin Nichols wrote:
Richard--
Thanks for pointing out the genealogy of -hireg- and -nestreg-. As for
the second point, are you saying you don't have qualms about -hireg-
and -nestreg-? True enough that -hireg- is not -stepwise- per se, but
it is a tool for model selection based on change in R2 or significance
tests on variables or groups of variables, which is like stepwise
regression, though "more respectable" as you say...  -nestreg- adds a
likelihood ratio test option to the Wald stat from -hireg- in addition
to supporting many other estimation commands (including svy commands),
but -stepwise- offers similar choices, and even has a -hierarchical-
option. The question is, to my mind, is the whole enterprise
statistically legitimate?  Are the calculated standard errors in your
chosen model adjusted for the fact that you threw out the 17 other
variables that did not pass muster in the 7 other regressions you
estimated?  Maybe _mtest should be built in... but I have a feeling
that a suitable Monte Carlo would reject these methods, even with some
marginal corrections.
I think there is a HUGE difference between the mindless empirical atheoretical selection of variables and the specification and testing of a theoretically derived logical sequence of models. Given enough time, I imagine I could come up a couple thousand citations of decent articles that used nestreg or its equivalent. Further, I bet that estout and the like make a good chunk of their money from presenting side by side comparisons of the results of nested models. Perhaps Ben Jann can check his business records on this. :)

I think there are some false premises in your argument. Sure, people shouldn't just cherry-pick their results, doing dozens of runs and only presenting the ones that came out significant. But heck, people were doing that long before anybody ever thought of nestreg. There is nothing about nestreg that makes it more likely or less likely that you're only going to get selective presentations of results.

Second, I don't see any complaints about nestreg that couldn't also be made about test or lrtest or even just looking at a bunch of individual t-values for coefficients. If you are running a bunch of tests, you may want to use more stringent significance levels, e.g. .01 or a Bonferroni adjustment or whatever. nestreg is hardly unique in that respect.

Third, you say that nestreg "is a tool for model selection based on change in R2..." I think that is often a secondary consideration. People who run sequences of models are often more interested in how coefficients change as you go from one model to the next. For example, if race is highly significant in block 1, but insignificant in block 3, then that may suggest that the effects of race are indirect, e.g. race affects education which in turn affects income. Or, if X significantly affects Y in Block 1 but the effect of X becomes insignificant in Block 2 after Income is added, then that may suggest that the relationship between X and Y is spurious and produced by the common cause of income.

To the extent that nestreg is used for model selection, you are usually doing something like moving from a simple model to an increasingly complex model, looking for the most parsimonious model you can justify. Sometimes the blocks are ordered temporally (e.g. characteristics determined at birth like sex and race, followed by vars determined later in life, such as education, followed by more immediate vars). By going through a sequence of models, you may get a feel for how much your life's fate was determined at birth and how much it was affected by later developments. Or, vars might be ordered by content, e.g. demographic vars in one block, attitudinal vars in another. Do attitudinal vars really gain us that much over what we can get just from demographic vars alone?

In sum, I think specifying and testing a logical hierarchy of models can be extremely informative and useful. It isn't just data drudging, it is theory testing. And it can give us a lot of insights that just testing one final model could lead us to overlook. Sure, there can be abuses, but anything you could do wrong with nestreg you could just as easily do wrong some other way. If anything, problems may be less likely, in that using nestreg forces you to logically think through what the sequence of models and tests should be, as opposed to doing things on a more haphazard basis.


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW: http://www.nd.edu/~rwilliam

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index