Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: k-fold cross validation


From   Richard Goldstein <richgold@ix.netcom.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: k-fold cross validation
Date   Fri, 15 Feb 2008 11:32:15 -0500

1. see the jackknife command for the extreme version of this

2. you may prefer to use bootstrap -- see that command

Rich

Nalin Payakachat wrote:
Hi,

I would like to perform k-fold cross validation using Stata. Here are
explanation for k-fold (http://www.cs.cmu.edu/~schneide/tut5/node42.html):

K-fold cross validation is one way to improve over the holdout method. The data
set is divided into k subsets, and the holdout method is repeated k times. Each
time, one of the k subsets is used as the test set and the other k-1 subsets are
put together to form a training set. Then the average error across all k trials
is computed. The advantage of this method is that it matters less how the data
gets divided. Every data point gets to be in a test set exactly once, and gets
to be in a training set k-1 times. The variance of the resulting estimate is
reduced as k is increased. The disadvantage of this method is that the training
algorithm has to be rerun from scratch k times, which means it takes k times as
much computation to make an evaluation. A variant of this method is to randomly
divide the data into a test and training set k different times. The advantage of
doing this is that you can independently choose how large each test set is and
how many trials you average over.

If anybody could help, I would deeply appreciate it.
Thank you so much.

Nalin
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index