Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Joerg Luedicke <joerg.luedicke@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: FW: help on variable selection problem |

Date |
Fri, 10 Jun 2011 16:19:00 -0400 |

On Fri, Jun 10, 2011 at 3:40 PM, Lachenbruch, Peter <Peter.Lachenbruch@oregonstate.edu> wrote: > This is not especially a Stata question, but it is driven by an analysis issue... > > A student is trying to analyze data from a national survey (no weights needed). She has 26 variables plus 10 years of data. There are about 1,000,000 observations. With this many observations, everything is significantly different from 0. She's using mlogit (predicting medical care expenses), so she'd like to cut down the number of 'important' predictors. I have thought of several options: backward stepwise (not available with mlogit); look at effect size and insist it be larger than 0.05 - again not available since there are four categories of the response variable; use a Bonferroni inequality on the coefficients and insist on a low p-value to begin with - e.g. try for a size of 0.01 adjusting for 25 tests, so p must be less than 0.0004. The issue seems to be the huge sample size pushing everything to significance. > Does anybody have any ideas? > Some $0.02: 1) "She's using mlogit (predicting medical care expenses), so she'd like to cut down the number of 'important' predictors." I do not quite understand the logic here. Let's say you have 25 variables, all significant. Now you remove 15 and the remaining 10 in the model are all significant. What would you gain by that? (BTW are "medical care expenses" not at least measured on ordinal scale?) 2) "I have thought of several options: backward stepwise" This is usually problematic, see: http://www.stata.com/support/faqs/stat/stepwise.html 3) "adjusting for 25 tests" I cannot really see how this is a multiple comparisons problem. But even if you would do an adjustment like that, it would not really help (see the point below) 4) "The issue seems to be the huge sample size pushing everything to significance." That is why you should look at effect sizes first and care less about p-values. Just see if the predictor's contributions are of substantive size. For example, if you find an odds ratio for women of 1.01, you can conclude that there is not much difference across gender, regardless of whether this is "significant" or not. Even if it was significant at p<0.0004 or whatever other level you would chose, that would not change anything. J. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: FW: help on variable selection problem***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

- Prev by Date:
**st: Assistance on variable selection problem** - Next by Date:
**Re: st: Handling pharmacy data with multiple entries per subject** - Previous by thread:
**Re: st: RE: FW: help on variable selection problem** - Next by thread:
**Re: st: FW: help on variable selection problem** - Index(es):