Re: st: Randomly draw from clustered sample

 From Michael Norman Mitchell To statalist@hsphsun2.harvard.edu Subject Re: st: Randomly draw from clustered sample Date Fri, 12 Feb 2010 00:55:23 -0800

```Dear Susan

```
I think the trick is to divide the sampling process into two steps. First, let's create an example dataset to work with...
```
* create example dataset
webuse highschool, clear
keep state id height
save example, replace

Now let's sample 20 states (clusters) from this dataset....

use example
* get one obs per state
duplicates drop state, force
* sample 20 states
sample 20, count

```
That we have 20 states sampled, let's merge that with the original dataset and keep just the matching observations. This means that we will have all of the persons from the 20 states.
```
merge 1:m state using example
keep if _merge == 3
* show that there are 20 states
tab state

Now let's sample 10 per state from within the 20 states.

* now sample 10 per state
sample 10, count by(state)
* show that there are 20 states, 10 per state
tab state

I hope that this helps!

Best regards,

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com

On 2010-02-12 12.22 AM, Susan Olivia wrote:
```
```Dear Stata listers,

I am wondering whether it is possible to randomly draw few
clusters from clustered sample in Stata?

Say my full sample consists of 10,000 observations (with 100
clusters and each cluster has 100 observations).

I want to randomly draw few clusters with 10 observations in
each cluster. I tried using the 'sample' command, but this
is not doing what I after.  Below is my attempt, it still
gave me 100 clusters.

Any advice on this, much appreciated.

Thanks,

Susan

. summ

Variable |       Obs        Mean    Std. Dev.       Min
Max
-------------+--------------------------------------------------------
xcoord |     10000    48.35506    27.67569  -1.426747
100.8945
ycoord |     10000    47.60003    27.35285  -.4297747
99.15934
cluster_id |     10000        50.5    28.86751          1
100

. preserve

. sample 10, by(cluster_id)
(9000 observations deleted)

. summ

Variable |       Obs        Mean    Std. Dev.       Min
Max
-------------+--------------------------------------------------------
xcoord |      1000    48.36142    27.67134   -.768847
100.4548
ycoord |      1000    47.58521    27.34904   .2530959
98.3428
cluster_id |      1000        50.5    28.88051          1
100

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```