Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: generating random samples that oversample

From   Jaime Wright <>
Subject   st: generating random samples that oversample
Date   Fri, 13 Jul 2012 20:26:15 -0700


I have been struggling on how best to derive a random sample from a dataset
I am working with and I find this post helpful.

("How can I take random samples from a dataset?" by Nicholas Cox)

Although I have a dataset I am currently working with, I would like to pull
a random sample from this and then survey this new sample of participants.
Although, I find some of the posts in regard to randomizing a list helpful,
I am having difficulty in finding a process by which I can generate a
random sample that also oversamples specified groups.

Most of the information I have found in both the Stata guide and Statalist
in regard to oversampling is in relation to stratified samples.

I have found the information in the link above in regard to "subdividing
into groups" (esp. on how to divide into unequal groups) the most helpful
in what I want to do, yet, is there an additional step wherein I can have
Stata generate stratified samples for different groups I want to survey at
a number that is required for oversampling specified groups?

Here is a description of my project:

I am using a dataset that includes roughly 4200 participants. Out of these
participants there are roughly 3700 that meet my criteria for my study.

This group is divided into the following racial categories: non-Hispanic
white (70.9%), Hispanic (10.2), Asian (9.0), African-American (6.5), and
other (3.4). So I would like to survey a subset of these participants, yet
I want to make sure that I sample a sufficient amount of those who are
non-white. So far, it seems straightforward enough to generate a random
sample from this dataset, but more difficult to generate a stratified
sample with the correct amount of oversampling.

There is also the matter of time and resources. I may be limited to sending
out recruitment letters to only 300-400 hundred potential
participants--probably closer to 300. Based off of other recruitment
letters that have gone out as part of the larger study I am pulling from, I
expect an approximate response rate of 1/3. It's at that point, as well,
where I wonder if I should even be concerned about a representative sample.

My apologies in advance, if I have overlooked a simple approach to the
matter. I have read through some of the posts on stratified samples.
generating samples, and oversampling, but they seem to be primarily in
regard to off-setting clusters. It is also a possibility that some of these
procedures are quite clear in regard to generating weights, but again, I
have not seen the generation of weights solely for the purpose of
generating a stratified list--more often it seems in response to a
stratified sample. Finally, there are some posts that exceed my skill level
and perhaps the procedures are clearly stated there as well. So again, my
apologies for asking this question again, since I'm sure it's addressed
somewhere. I do, however, appreciate any time and guidance on this matter.

Thank you,
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index