Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: dividing a data set into estimation and validation sets

From   "Nick Cox" <>
To   <>
Subject   st: RE: dividing a data set into estimation and validation sets
Date   Sun, 3 Apr 2005 15:54:28 +0100

I don't think you need any section
of the manual as support here, but 
FWIW Stata's -sample- doesn't do this. 

The unofficial -swor- (-search swor-)
will do it. 

But best of all is to think from first 
principles. Suppose we decide on 
a validation sample of 500: then we 
should be explicit about a random 
number seed for reproducibility. 
Your seed choice may naturally differ, 
but here's one 

set seed 280352 

Then we pick some random numbers 
and shuffle: 

gen random = uniform()
sort random 

The first whatever observations
are one sample: 

gen byte validation = _n <= 500 

Your validation sample has 
-validation- 1 and the other sample has 
validation 0. Subsequent analyses
can be done 

... if validation
... if !validation 

Having written that down, I now 
remember that this is already an FAQ: 

How can I take random samples from an existing dataset?


Richard Hiscock

> I would be grateful for some direction to the area in the 
> stata manual 
> that explains how to do the following
> I am trying to split a dataset (n ~1500) into an estimation 
> sample and a 
> validation sample  by random sampling (n = 400-500) from the dataset
> Later I wish to compare results with that using bstrap techniques

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index