Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: semi-random sampling (how to impose properties of one population onto a subsample of a different population)

 From Ekaterina Hertog <[email protected]> To "[email protected]" <[email protected]> Subject Re: st: semi-random sampling (how to impose properties of one population onto a subsample of a different population) Date Sun, 07 Aug 2011 13:05:56 +0400

Dear Steven,
thank you for your help, however it does not fully solve my problem. Your proposed solution will allow me to roughly preserve the population percentages from the whole sample into a subsample. What I need however, is to impose populations percentages found in a different dataset on a subsample I am creating. Essentially i have two datasets: one of high income women and one of middle income women. High income women tend to be older and are more likely to live in the capital. I need to create a subsample of a dataset of middle income woemn which would match the high income women dataset on age and location characteristics.
Does anyone know how to do this in Stata 11?
Ekaterina

On 07/08/2011 09:08, Steven Samuels wrote:
The following code shows how to take a 10% sample within categories formed by two variables. The sample and whole population percentages will be approximately the same, with the agreement better for larger within-cell sample sizes.

Steve

*************CODE BEGINS*************
sysuse auto, clear
expand 6
set seed 842655
recode rep78 1/2=5 .=5
tab rep78 foreign, cell
sample 10, by(foreign rep78)
tab rep78 foreign, cell
**************CODE ENDS**************

On Aug 6, 2011, at 4:23 PM, Ekaterina Hertog wrote:

Dear all,
I need to take a subsample of observations from a big dataset making sure that the people in the subsample have a given geographic and age profile. I need to make sure that, say, 50% of people in the subsample come from the capital and 50% from other towns. Within each of these 2 locations I want to preserve a certain age structure: say in a city: 3 people ages 23, 4 people aged 24 …
Within those geographic and age profiles I want to select the observations randomly. Is it possible to do that in Stata 11? Any thoughts on how I would go about it?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

• Follow-Ups: