[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
David Vaughan <dvk@dvkconsult.com.au> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Signficance vs prediction |

Date |
Wed, 10 Mar 2004 18:10:59 +1100 |

I know this is pretty simple but the answer is not obvious in my old texts and in business I have no expert colleague to whom to turn.

My purpose is to construct a model which will be used for best-possible prediction from new input data.

I constructed a regression model, based on historical understanding of the domain, using eight predictors and obtained the following data about the model:

F(8,98) = 35.15

Adj R-squared = 0.7205

RMSE = 0.90373

I noted that three of the predictors had P>| t | around 0.2-0.24. Eliminating those gave me model results:

F(5,101) = 54.49

Adj R2 = 0.7295

RMSE = 0.91067

So significance has gone up but so has error. I assume that the larger model over-fits the data and, if I were arguing around causaility, would prefer the more compact model. Yet, it seems that the larger model just does a slightly better job of prediction. How do I think about this? Generally, where do I stop in a predictive problem (there are other inputs available)? Should I care that much about a minor RMSE difference or just do a "judgement" check on error differences on new data? I also did a decent (N=1000) bootstrap on the larger model and confidence intervals around all the predictors appeared reasonable for our purpose. Either of the above models serves better than our previous approach although it seems (opinion) that the larger model does better at the extremes.

Talking to myself, I wonder if I just need more data for analysis (painful process) but is there a statistical approach to focussing on that extreme-edge issue? Perhaps I should be looking for another inflection point in the model - we have already found one at the other end, which I omitted from the above for brevity. If so, how does one find it other than by trial?

Any advice or reading directions welcome.

thanks

David

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: Signficance vs prediction***From:*"Naji Nassar \(MIReS\)" <naji.nassar@mires.fr>

- Prev by Date:
**RE: st: Problems with downloading the collinearity program from** - Next by Date:
**st: means when one variable has associated uncertainty** - Previous by thread:
**RE: st: k-step Markov** - Next by thread:
**st: RE: Signficance vs prediction** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |