Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: generating random samples that oversample

From   Jaime Wright <[email protected]>
To   [email protected]
Subject   Re: st: generating random samples that oversample
Date   Sun, 15 Jul 2012 10:56:14 -0700

Thanks Mike,

The feedback you provided is helpful.

I have decided that I will restrict myself to only surveying 300
participants from the larger study (with hopes of a 1/3 response
rate). Of these 300, I was going to split them up into white and
non-white (even as I realize that is problematic from a critical
theoretical perspective in terms of homogenizing "non-white"). The
percentage of white participants is 70% and that of non-whites 30%. I
figured I would survey 150 whites and 150 non-whites and then just
adjust for that oversampling by weighting the data.

I believe that this would also allow me to initially sample utilizing
the suggested code below after creating a variable for "non-white."

I realize that my sample will not be statistically representative of
the larger data set, but my goal at this point will be to increase the
diversity of those who choose to respond.

Thanks again,
Jaime Wright

On Sun, Jul 15, 2012 at 6:34 AM, Lacy,Michael
<[email protected]> wrote:
> On Fri, 13 Jul 2012 20:26:15 -0700  Jaime Wright wrote:
>>I have been struggling on how best to derive a random sample from a dataset
>>I am working with and I find this post helpful.
> ... snip snip,
>>Here is a description of my project:
>>I am using a dataset that includes roughly 4200 participants. Out of these
>>participants there are roughly 3700 that meet my criteria for my study.
>>This group is divided into the following racial categories: non-Hispanic
>>white (70.9%), Hispanic (10.2), Asian (9.0), African-American (6.5), and
>>other (3.4). So I would like to survey a subset of these participants, yet
>>I want to make sure that I sample a sufficient amount of those who are
>>non-white. So far, it seems straightforward enough to generate a random
>>sample from this dataset, but more difficult to generate a stratified
>>sample with the correct amount of oversampling.
> -sample- using -if- will allow you to sample by group.
> So, it sounds to me as though you want something like.
> sample 100 if (racecat == 1), count  // white anglo
> sample 50 if (racecat == 2), count  // non-Hispanic white
>  ...
> sample 40 if (race ==3), count //(African-American)
> tabulate race
> save "my sample.dta"
> --------------
> where 100, , 50, ..., and 40 are the sample sizes you have determined
> for each group.  I also get the impression that you might have questions
> about what these sample sizes should be, i.e., how much "oversampling"
> to choose.  That would depend on the goals of your analysis, in particular
> the relative importance to you of generalizing about an entire population
> of interest  vs. comparing the responses of groups within the population.
> Roughly speaking,the latter goal would lead to samples of equal size for
> each group,while the former would lead to choosing a size for each
> group that duplicates its proportion in the source population.
> Regards,
> Mike Lacy
> Dept. of Sociology
> Colorado State University
> Fort Collins CO 80523-1784
> *
> *   For searches and help try:
> *
> *
> *

Jaime Wright
Doctoral Student
Ethics and Social Theory
Graduate Theological Union
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index