Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Jaime Wright <jwright@ses.gtu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: generating random samples that oversample |

Date |
Sun, 15 Jul 2012 10:56:14 -0700 |

Thanks Mike, The feedback you provided is helpful. I have decided that I will restrict myself to only surveying 300 participants from the larger study (with hopes of a 1/3 response rate). Of these 300, I was going to split them up into white and non-white (even as I realize that is problematic from a critical theoretical perspective in terms of homogenizing "non-white"). The percentage of white participants is 70% and that of non-whites 30%. I figured I would survey 150 whites and 150 non-whites and then just adjust for that oversampling by weighting the data. I believe that this would also allow me to initially sample utilizing the suggested code below after creating a variable for "non-white." I realize that my sample will not be statistically representative of the larger data set, but my goal at this point will be to increase the diversity of those who choose to respond. Thanks again, Jaime Wright On Sun, Jul 15, 2012 at 6:34 AM, Lacy,Michael <Michael.Lacy@colostate.edu> wrote: > On Fri, 13 Jul 2012 20:26:15 -0700 Jaime Wright wrote: > >>Hello, >> >>I have been struggling on how best to derive a random sample from a dataset >>I am working with and I find this post helpful. >> > ... snip snip, >>Here is a description of my project: >> >>I am using a dataset that includes roughly 4200 participants. Out of these >>participants there are roughly 3700 that meet my criteria for my study. >> >>This group is divided into the following racial categories: non-Hispanic >>white (70.9%), Hispanic (10.2), Asian (9.0), African-American (6.5), and >>other (3.4). So I would like to survey a subset of these participants, yet >>I want to make sure that I sample a sufficient amount of those who are >>non-white. So far, it seems straightforward enough to generate a random >>sample from this dataset, but more difficult to generate a stratified >>sample with the correct amount of oversampling. > > -sample- using -if- will allow you to sample by group. > > So, it sounds to me as though you want something like. > > sample 100 if (racecat == 1), count // white anglo > sample 50 if (racecat == 2), count // non-Hispanic white > ... > sample 40 if (race ==3), count //(African-American) > tabulate race > save "my sample.dta" > -------------- > > where 100, , 50, ..., and 40 are the sample sizes you have determined > for each group. I also get the impression that you might have questions > about what these sample sizes should be, i.e., how much "oversampling" > to choose. That would depend on the goals of your analysis, in particular > the relative importance to you of generalizing about an entire population > of interest vs. comparing the responses of groups within the population. > Roughly speaking,the latter goal would lead to samples of equal size for > each group,while the former would lead to choosing a size for each > group that duplicates its proportion in the source population. > > Regards, > > Mike Lacy > Dept. of Sociology > Colorado State University > Fort Collins CO 80523-1784 > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Jaime Wright Doctoral Student Ethics and Social Theory Graduate Theological Union * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re:st: generating random samples that oversample***From:*"Lacy,Michael" <Michael.Lacy@colostate.edu>

- Prev by Date:
**Re: st: If variable x contains variable y** - Next by Date:
**Re: st: If variable x contains variable y** - Previous by thread:
**Re:st: generating random samples that oversample** - Next by thread:
**st: SUR** - Index(es):