Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: semi-random sampling (how to impose properties of one population onto a subsample of a different population) |

Date |
Sun, 7 Aug 2011 10:32:36 -0400 |

Sorry, I misunderstood. Here's code that you can adapt. Note that you set the sample size you want in the first line *************CODE BEGINS************* scalar sampsize = 500 set seed 842655 clear /* Input Frequencies for External Population You can get these from -contract- in the original external data set: "contract agegp region, freq(freq1)" */ input agegp region freq1 1 1 501 1 2 415 2 1 1809 2 2 3003 3 1 1288 3 2 1400 end egen tot1 = total(freq1) gen ssize = round(sampsize*freq1/tot1) /* Check Frequencies */ tab agegp region [fw=freq1], cell tab agegp region [fw=ssize], cell sort agegp region tempfile t1 save `t1' /* Create Data set to be sampled from the auto data */ sysuse auto, clear expand 100 rename rep78 agegp rename foreign region recode agegp 2=1 5=1 .=1 3=2 4=3 // values 1,2,3 replace region = region +1 // values 1,2 /* Merge with external counts */ sort agegp region merge m:1 agegp region using `t1' tab _merge drop _merge egen stratum = group(agegp region) levelsof stratum, local(levels) tempfile t2 save `t2' foreach x of local levels{ use `t2' keep if stratum==`x' gen u = uniform() sort u keep if _n<=ssize tempfile td`x' save `td`x'' } clear tempfile t0 //empty data set to append to gen dummy=1 save `t0' foreach x of local levels{ append using `td`x'' } drop dummy /* Check frequencies again */ tab agegp region , cell missing save sample1, replace **************CODE ENDS************** On Aug 7, 2011, at 5:05 AM, Ekaterina Hertog wrote: Dear Steven, thank you for your help, however it does not fully solve my problem. Your proposed solution will allow me to roughly preserve the population percentages from the whole sample into a subsample. What I need however, is to impose populations percentages found in a different dataset on a subsample I am creating. Essentially i have two datasets: one of high income women and one of middle income women. High income women tend to be older and are more likely to live in the capital. I need to create a subsample of a dataset of middle income woemn which would match the high income women dataset on age and location characteristics. Does anyone know how to do this in Stata 11? Ekaterina On 07/08/2011 09:08, Steven Samuels wrote: > The following code shows how to take a 10% sample within categories formed by two variables. The sample and whole population percentages will be approximately the same, with the agreement better for larger within-cell sample sizes. > > Steve > > *************CODE BEGINS************* > sysuse auto, clear > expand 6 > set seed 842655 > recode rep78 1/2=5 .=5 > tab rep78 foreign, cell > sample 10, by(foreign rep78) > tab rep78 foreign, cell > **************CODE ENDS************** > > > > On Aug 6, 2011, at 4:23 PM, Ekaterina Hertog wrote: > > Dear all, > I need to take a subsample of observations from a big dataset making sure that the people in the subsample have a given geographic and age profile. I need to make sure that, say, 50% of people in the subsample come from the capital and 50% from other towns. Within each of these 2 locations I want to preserve a certain age structure: say in a city: 3 people ages 23, 4 people aged 24 … > Within those geographic and age profiles I want to select the observations randomly. Is it possible to do that in Stata 11? Any thoughts on how I would go about it? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: semi-random sampling (how to impose properties of one population onto a subsample of a different population)***From:*Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk>

**Re: st: semi-random sampling (how to impose properties of one population onto a subsample of a different population)***From:*Austin Nichols <austinnichols@gmail.com>

**References**:**st: semi-random sampling***From:*Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk>

**Re: st: semi-random sampling***From:*Steven Samuels <sjsamuels@gmail.com>

*From:*Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk>

- Prev by Date:
**Re: st: date in stata** - Next by Date:
**re: st: 95% CI at one or two points in a Kaplan-Meier curve?** - Previous by thread:
- Next by thread:
- Index(es):