# st: RE: dividing a data set into estimation and validation sets

 From "Nick Cox" To Subject st: RE: dividing a data set into estimation and validation sets Date Sun, 3 Apr 2005 15:54:28 +0100

```I don't think you need any section
of the manual as support here, but
FWIW Stata's -sample- doesn't do this.

The unofficial -swor- (-search swor-)
will do it.

But best of all is to think from first
principles. Suppose we decide on
a validation sample of 500: then we
should be explicit about a random
number seed for reproducibility.
Your seed choice may naturally differ,
but here's one

set seed 280352

Then we pick some random numbers
and shuffle:

gen random = uniform()
sort random

The first whatever observations
are one sample:

gen byte validation = _n <= 500

-validation- 1 and the other sample has
validation 0. Subsequent analyses
can be done

... if validation
... if !validation

Having written that down, I now
remember that this is already an FAQ:

How can I take random samples from an existing dataset?
http://www.stata.com/support/faqs/stat/sampling.html

Nick
n.j.cox@durham.ac.uk

Richard Hiscock

> I would be grateful for some direction to the area in the
> stata manual
> that explains how to do the following
> I am trying to split a dataset (n ~1500) into an estimation
> sample and a
> validation sample  by random sampling (n = 400-500) from the dataset
>
> Later I wish to compare results with that using bstrap techniques

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```