Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Generating logistic regression models for forecasting


From   Jeph Herrin <[email protected]>
To   [email protected]
Subject   Re: st: Generating logistic regression models for forecasting
Date   Wed, 08 Jan 2014 22:14:50 -0500

The point of split sample validation is not to avoid overfitting, but to
ascertain that the model is not sample specific, which is a good idea
when you are forecasting.

Split sample validation is still in vogue. -crossval.ado- (at least the one that Google turns up) does pretty much the same thing: it splits the sample randomly, estimates the model on each sample, and then compares the fit across the random samples.

cheers,
Jeph

On 1/8/2014 5:16 PM, Brent McSharry (ADHB) wrote:
Dear statalisters

The question is more of a statistics than a Stata question, so
apologies if this is the wrong forum (please feel free to redirect me
to a more appropriate forum).

Years ago, when developing logistic regression models for
forecasting, data would be randomly split into development and
testing data sets (using something like .gen develop = runiform() <
1/3). This is much less popular now, and using Rick Thompson's
crossval.ado I have a very useful tool for testing fit and
discrimination of variables included in the final model.

However, in the stages of model development, It would be logical that
I should no longer limit included data to a randomly defined
development data set (eg logit x y z if develop). The whole point of
the old technique (to my understanding) was to avoid overfitting when
choosing which covariates to include. Is it reasonable to develop the
model on the whole data set, and then test the final model using
crossval, or should I be limiting the data set used for model
development some how (or indeed use some tool for multiple resampling
to obtain model coefficients and estimates, while in the model
development phase)?

Thank you very much.

Brent McSharry MBBS BSc(med) FCICM(paed) Paediatric Intensivist
Starship Children's Hospital Private Bag 92024 Auckland 1142 New
Zealand


* *   For searches and help try: *
http://www.stata.com/help.cgi?search *
http://www.stata.com/support/faqs/resources/statalist-faq/ *
http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index