Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re:st: generating random samples that oversample

From   "Lacy,Michael" <>
To   "" <>
Subject   Re:st: generating random samples that oversample
Date   Sun, 15 Jul 2012 13:34:42 +0000

On Fri, 13 Jul 2012 20:26:15 -0700  Jaime Wright wrote:

>I have been struggling on how best to derive a random sample from a dataset
>I am working with and I find this post helpful.
... snip snip, 
>Here is a description of my project:
>I am using a dataset that includes roughly 4200 participants. Out of these
>participants there are roughly 3700 that meet my criteria for my study.
>This group is divided into the following racial categories: non-Hispanic
>white (70.9%), Hispanic (10.2), Asian (9.0), African-American (6.5), and
>other (3.4). So I would like to survey a subset of these participants, yet
>I want to make sure that I sample a sufficient amount of those who are
>non-white. So far, it seems straightforward enough to generate a random
>sample from this dataset, but more difficult to generate a stratified
>sample with the correct amount of oversampling.

-sample- using -if- will allow you to sample by group.

So, it sounds to me as though you want something like.

sample 100 if (racecat == 1), count  // white anglo
sample 50 if (racecat == 2), count  // non-Hispanic white
sample 40 if (race ==3), count //(African-American)
tabulate race 
save "my sample.dta"

where 100, , 50, ..., and 40 are the sample sizes you have determined
for each group.  I also get the impression that you might have questions
about what these sample sizes should be, i.e., how much "oversampling"
to choose.  That would depend on the goals of your analysis, in particular
the relative importance to you of generalizing about an entire population
of interest  vs. comparing the responses of groups within the population.
Roughly speaking,the latter goal would lead to samples of equal size for 
each group,while the former would lead to choosing a size for each
group that duplicates its proportion in the source population.


Mike Lacy
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index