Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: k-fold cross validation


From   Nalin Payakachat <npayakac@purdue.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: k-fold cross validation
Date   Fri, 15 Feb 2008 11:19:45 -0500

Hi,

I would like to perform k-fold cross validation using Stata. Here are
explanation for k-fold (http://www.cs.cmu.edu/~schneide/tut5/node42.html):

K-fold cross validation is one way to improve over the holdout method. The data
set is divided into k subsets, and the holdout method is repeated k times. Each
time, one of the k subsets is used as the test set and the other k-1 subsets are
put together to form a training set. Then the average error across all k trials
is computed. The advantage of this method is that it matters less how the data
gets divided. Every data point gets to be in a test set exactly once, and gets
to be in a training set k-1 times. The variance of the resulting estimate is
reduced as k is increased. The disadvantage of this method is that the training
algorithm has to be rerun from scratch k times, which means it takes k times as
much computation to make an evaluation. A variant of this method is to randomly
divide the data into a test and training set k different times. The advantage of
doing this is that you can independently choose how large each test set is and
how many trials you average over.

If anybody could help, I would deeply appreciate it.
Thank you so much.

Nalin
-- 
Nalin Payakachat
Ph.D Student in Pharmacy Administration
Department of Pharmacy Practice
School of Pharmacy and Pharmaceutical Sciences
Purdue University
Tel:(765)4962413 (office) 




*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index