[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: k-fold cross validation

From   Nalin Payakachat <>
Subject   st: k-fold cross validation
Date   Fri, 15 Feb 2008 11:19:45 -0500


I would like to perform k-fold cross validation using Stata. Here are
explanation for k-fold (

K-fold cross validation is one way to improve over the holdout method. The data
set is divided into k subsets, and the holdout method is repeated k times. Each
time, one of the k subsets is used as the test set and the other k-1 subsets are
put together to form a training set. Then the average error across all k trials
is computed. The advantage of this method is that it matters less how the data
gets divided. Every data point gets to be in a test set exactly once, and gets
to be in a training set k-1 times. The variance of the resulting estimate is
reduced as k is increased. The disadvantage of this method is that the training
algorithm has to be rerun from scratch k times, which means it takes k times as
much computation to make an evaluation. A variant of this method is to randomly
divide the data into a test and training set k different times. The advantage of
doing this is that you can independently choose how large each test set is and
how many trials you average over.

If anybody could help, I would deeply appreciate it.
Thank you so much.

Nalin Payakachat
Ph.D Student in Pharmacy Administration
Department of Pharmacy Practice
School of Pharmacy and Pharmaceutical Sciences
Purdue University
Tel:(765)4962413 (office) 

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index