[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Randomly Sample Data w/o replacement by a Variable

From   Howard Lempel <>
To   "" <>
Subject   st: Randomly Sample Data w/o replacement by a Variable
Date   Thu, 25 Jun 2009 11:52:50 -0400

Hello all,

I'd like to get a simple random sample of X% (weighted) of different subsamples of my data without replacement and without dropping the observations that were not selected.  

For example, using the auto dataset, I'd like to create a new variable called "sample" which is equal to 1 for a randomly selected 75% of foreign cars (weighted by weight) and 75% of domestic cars and equal to 0 for all other cars.

Here's my current attempt, which requires -xtile2- (SSC).  Does anyone know if there is a way to do this in one line?

**** Start Code *****
sysuse auto
set seed 4635
gen random = uniform()
xtile2 rank = random [aw=weight], nq(4) by(foreign)

*Pick 75% for sample
gen sample = (rank<4)
****** End Code *********

Thanks for your consideration.

Howie Lempel
Research Assistant
The Brookings Institution | Economic Studies
1775 Massachusetts Ave NW | Washington DC 20036 | p: (202) 238-3576

*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index