[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: k-fold cross validation

From   Richard Goldstein <>
Subject   Re: st: k-fold cross validation
Date   Fri, 15 Feb 2008 11:32:15 -0500

1. see the jackknife command for the extreme version of this

2. you may prefer to use bootstrap -- see that command


Nalin Payakachat wrote:

I would like to perform k-fold cross validation using Stata. Here are
explanation for k-fold (

K-fold cross validation is one way to improve over the holdout method. The data
set is divided into k subsets, and the holdout method is repeated k times. Each
time, one of the k subsets is used as the test set and the other k-1 subsets are
put together to form a training set. Then the average error across all k trials
is computed. The advantage of this method is that it matters less how the data
gets divided. Every data point gets to be in a test set exactly once, and gets
to be in a training set k-1 times. The variance of the resulting estimate is
reduced as k is increased. The disadvantage of this method is that the training
algorithm has to be rerun from scratch k times, which means it takes k times as
much computation to make an evaluation. A variant of this method is to randomly
divide the data into a test and training set k different times. The advantage of
doing this is that you can independently choose how large each test set is and
how many trials you average over.

If anybody could help, I would deeply appreciate it.
Thank you so much.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index