RE: st: k-fold cross validation

 From "Lachenbruch, Peter" <[email protected]> To <[email protected]> Subject RE: st: k-fold cross validation Date Fri, 15 Feb 2008 09:51:19 -0800

```I prefer bootstrapping myself.

One issue with LOO is that the 'residuals' are correlated and with small
samples (say n<30 or so) the increase in variance is a problem.  The
method was first proposed by Quenouille in the 1950s, I used it in the
1960s, and Ned Glick showed that the bootstrap did a better job (i.e.,
smaller MSE) in the late 70s.

If you want to use k-observations at a time, it may be better to redo
these with random sampling of the k observations.

David Allen did something akin to this with his PRESS criterion
(Prediction Error Sum of Squares) - I don't think Stata has this (at
least under the name of PRESS), but may be able to give an equivalent
statistic.

The trick to all of this is a simple matrix inversion formula, so that
one only needs to compute the inverse of X'X once and the rest is
multiplications.

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Richard
Goldstein
Sent: Friday, February 15, 2008 8:32 AM
To: [email protected]
Subject: Re: st: k-fold cross validation

1. see the jackknife command for the extreme version of this

2. you may prefer to use bootstrap -- see that command

Rich

Nalin Payakachat wrote:
> Hi,
>
> I would like to perform k-fold cross validation using Stata. Here are
> explanation for k-fold
(http://www.cs.cmu.edu/~schneide/tut5/node42.html):
>
> K-fold cross validation is one way to improve over the holdout method.
The data
> set is divided into k subsets, and the holdout method is repeated k
times. Each
> time, one of the k subsets is used as the test set and the other k-1
subsets are
> put together to form a training set. Then the average error across all
k trials
> is computed. The advantage of this method is that it matters less how
the data
> gets divided. Every data point gets to be in a test set exactly once,
and gets
> to be in a training set k-1 times. The variance of the resulting
estimate is
> reduced as k is increased. The disadvantage of this method is that the
training
> algorithm has to be rerun from scratch k times, which means it takes k
times as
> much computation to make an evaluation. A variant of this method is to
randomly
> divide the data into a test and training set k different times. The
> doing this is that you can independently choose how large each test
set is and
> how many trials you average over.
>
> If anybody could help, I would deeply appreciate it.
> Thank you so much.
>
> Nalin
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```