Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Lacy,Michael" <Michael.Lacy@colostate.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re:st: generating random samples that oversample |

Date |
Sun, 15 Jul 2012 13:34:42 +0000 |

On Fri, 13 Jul 2012 20:26:15 -0700 Jaime Wright wrote: >Hello, > >I have been struggling on how best to derive a random sample from a dataset >I am working with and I find this post helpful. > ... snip snip, >Here is a description of my project: > >I am using a dataset that includes roughly 4200 participants. Out of these >participants there are roughly 3700 that meet my criteria for my study. > >This group is divided into the following racial categories: non-Hispanic >white (70.9%), Hispanic (10.2), Asian (9.0), African-American (6.5), and >other (3.4). So I would like to survey a subset of these participants, yet >I want to make sure that I sample a sufficient amount of those who are >non-white. So far, it seems straightforward enough to generate a random >sample from this dataset, but more difficult to generate a stratified >sample with the correct amount of oversampling. -sample- using -if- will allow you to sample by group. So, it sounds to me as though you want something like. sample 100 if (racecat == 1), count // white anglo sample 50 if (racecat == 2), count // non-Hispanic white ... sample 40 if (race ==3), count //(African-American) tabulate race save "my sample.dta" -------------- where 100, , 50, ..., and 40 are the sample sizes you have determined for each group. I also get the impression that you might have questions about what these sample sizes should be, i.e., how much "oversampling" to choose. That would depend on the goals of your analysis, in particular the relative importance to you of generalizing about an entire population of interest vs. comparing the responses of groups within the population. Roughly speaking,the latter goal would lead to samples of equal size for each group,while the former would lead to choosing a size for each group that duplicates its proportion in the source population. Regards, Mike Lacy Dept. of Sociology Colorado State University Fort Collins CO 80523-1784 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: generating random samples that oversample***From:*Jaime Wright <jwright@ses.gtu.edu>

- Prev by Date:
**st: Re: Mysterious Output Given by GLLAMM for Multiple-Equation Generalized Linear Model** - Next by Date:
**st: If variable x contains variable y** - Previous by thread:
**st: generating random samples that oversample** - Next by thread:
**Re: st: generating random samples that oversample** - Index(es):