[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Nalin Payakachat <npayakac@purdue.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: k-fold cross validation |

Date |
Fri, 15 Feb 2008 11:19:45 -0500 |

Hi, I would like to perform k-fold cross validation using Stata. Here are explanation for k-fold (http://www.cs.cmu.edu/~schneide/tut5/node42.html): K-fold cross validation is one way to improve over the holdout method. The data set is divided into k subsets, and the holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. Then the average error across all k trials is computed. The advantage of this method is that it matters less how the data gets divided. Every data point gets to be in a test set exactly once, and gets to be in a training set k-1 times. The variance of the resulting estimate is reduced as k is increased. The disadvantage of this method is that the training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to make an evaluation. A variant of this method is to randomly divide the data into a test and training set k different times. The advantage of doing this is that you can independently choose how large each test set is and how many trials you average over. If anybody could help, I would deeply appreciate it. Thank you so much. Nalin -- Nalin Payakachat Ph.D Student in Pharmacy Administration Department of Pharmacy Practice School of Pharmacy and Pharmaceutical Sciences Purdue University Tel:(765)4962413 (office) * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: k-fold cross validation***From:*Richard Goldstein <richgold@ix.netcom.com>

- Prev by Date:
**st: How to get predicted values from CLAD** - Next by Date:
**Re: st: k-fold cross validation** - Previous by thread:
**st: How to get predicted values from CLAD** - Next by thread:
**Re: st: k-fold cross validation** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |