Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Jaime Wright <jwright@ses.gtu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: generating random samples that oversample |

Date |
Fri, 13 Jul 2012 20:26:15 -0700 |

Hello, I have been struggling on how best to derive a random sample from a dataset I am working with and I find this post helpful. ("How can I take random samples from a dataset?" by Nicholas Cox) Although I have a dataset I am currently working with, I would like to pull a random sample from this and then survey this new sample of participants. Although, I find some of the posts in regard to randomizing a list helpful, I am having difficulty in finding a process by which I can generate a random sample that also oversamples specified groups. Most of the information I have found in both the Stata guide and Statalist in regard to oversampling is in relation to stratified samples. I have found the information in the link above in regard to "subdividing into groups" (esp. on how to divide into unequal groups) the most helpful in what I want to do, yet, is there an additional step wherein I can have Stata generate stratified samples for different groups I want to survey at a number that is required for oversampling specified groups? Here is a description of my project: I am using a dataset that includes roughly 4200 participants. Out of these participants there are roughly 3700 that meet my criteria for my study. This group is divided into the following racial categories: non-Hispanic white (70.9%), Hispanic (10.2), Asian (9.0), African-American (6.5), and other (3.4). So I would like to survey a subset of these participants, yet I want to make sure that I sample a sufficient amount of those who are non-white. So far, it seems straightforward enough to generate a random sample from this dataset, but more difficult to generate a stratified sample with the correct amount of oversampling. There is also the matter of time and resources. I may be limited to sending out recruitment letters to only 300-400 hundred potential participants--probably closer to 300. Based off of other recruitment letters that have gone out as part of the larger study I am pulling from, I expect an approximate response rate of 1/3. It's at that point, as well, where I wonder if I should even be concerned about a representative sample. My apologies in advance, if I have overlooked a simple approach to the matter. I have read through some of the posts on stratified samples. generating samples, and oversampling, but they seem to be primarily in regard to off-setting clusters. It is also a possibility that some of these procedures are quite clear in regard to generating weights, but again, I have not seen the generation of weights solely for the purpose of generating a stratified list--more often it seems in response to a stratified sample. Finally, there are some posts that exceed my skill level and perhaps the procedures are clearly stated there as well. So again, my apologies for asking this question again, since I'm sure it's addressed somewhere. I do, however, appreciate any time and guidance on this matter. Thank you, Jaime * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st:graph mean with missing groups** - Next by Date:
**Re: st: Trouble producing population standard deviations with collapse (sd)** - Previous by thread:
**st:graph mean with missing groups** - Next by thread:
**Re:st: generating random samples that oversample** - Index(es):