Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: FW: help on variable selection problem |

Date |
Sat, 11 Jun 2011 14:09:03 -0400 |

Tony, With this many observations, the student could consider also interactions and nonlinear terms,thus making the selection problem worse! Consider classification accuracy as a criterion. Then a CART-like approach might not only detect interactions and non-linear dependencies but could also cut down on the number of "important" variables. I'd suggest GUIDE at http://www.stat.wisc.edu/~loh/guide.html After CART or GUIDE has reduced the number of variables, the student could then examine the implied associations in the final model and, maybe, mimic them in a reduced multinomial logistic model. At that point she could use the contributed program -mlogitroc-, available at SSC, to gauge the accuracy of the final model. Steve On Jun 10, 2011, at 4:02 PM, Sarah Edgington wrote: . This doesn't seem to me to be a problem from the standpoint of analysis, just interpretation. A large sample size means that the estimates of coefficients are more precise than they would be with a small sample. No matter what your sample size, though, statistical significance isn't equivalent to substantive significance. My recommendation would be to specify a model that makes sense theoretically and then look at the results. What's "important" for discussion of the results will depend somewhat on the research question but discussing what kind of effect variables of interest have seems to me to be what's important regardless of sample size. Statistical significance doesn't tell you whether an effect size is large enough to be interesting. It tells you whether a coefficient is estimated precisely enough to be reasonably sure it isn't zero. A precisely estimated small effect is still a small effect. -Sarah -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Lachenbruch, Peter Sent: Friday, June 10, 2011 12:40 PM To: 'statalist@hsphsun2.harvard.edu' Subject: st: FW: help on variable selection problem This is not especially a Stata question, but it is driven by an analysis issue... A student is trying to analyze data from a national survey (no weights needed). She has 26 variables plus 10 years of data. There are about 1,000,000 observations. With this many observations, everything is significantly different from 0. She's using mlogit (predicting medical care expenses), so she'd like to cut down the number of 'important' predictors. I have thought of several options: backward stepwise (not available with mlogit); look at effect size and insist it be larger than 0.05 - again not available since there are four categories of the response variable; use a Bonferroni inequality on the coefficients and insist on a low p-value to begin with - e.g. try for a size of 0.01 adjusting for 25 tests, so p must be less than 0.0004. The issue seems to be the huge sample size pushing everything to significance. Does anybody have any ideas? Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: FW: help on variable selection problem***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**References**:**st: FW: help on variable selection problem***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**st: RE: FW: help on variable selection problem***From:*"Sarah Edgington" <sedging@ucla.edu>

- Prev by Date:
**re: st: graph scheme tufte** - Next by Date:
**Re: st: variable to enumerate children panel dataset** - Previous by thread:
**st: RE: FW: help on variable selection problem** - Next by thread:
**Re: st: RE: FW: help on variable selection problem** - Index(es):