Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Randomly draw from clustered sample


From   Michael Norman Mitchell <Michael.Norman.Mitchell@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Randomly draw from clustered sample
Date   Fri, 12 Feb 2010 00:55:23 -0800

Dear Susan

I think the trick is to divide the sampling process into two steps. First, let's create an example dataset to work with...

* create example dataset
webuse highschool, clear
keep state id height
save example, replace

  Now let's sample 20 states (clusters) from this dataset....

use example
* get one obs per state
duplicates drop state, force
* sample 20 states
sample 20, count

That we have 20 states sampled, let's merge that with the original dataset and keep just the matching observations. This means that we will have all of the persons from the 20 states.

merge 1:m state using example
keep if _merge == 3
* show that there are 20 states
tab state

  Now let's sample 10 per state from within the 20 states.

* now sample 10 per state
sample 10, count by(state)
* show that there are 20 states, 10 per state
tab state

  I hope that this helps!

Best regards,

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell

On 2010-02-12 12.22 AM, Susan Olivia wrote:
Dear Stata listers,

I am wondering whether it is possible to randomly draw few
clusters from clustered sample in Stata?

Say my full sample consists of 10,000 observations (with 100
clusters and each cluster has 100 observations).

I want to randomly draw few clusters with 10 observations in
each cluster. I tried using the 'sample' command, but this
is not doing what I after.  Below is my attempt, it still
gave me 100 clusters.

Any advice on this, much appreciated.

Thanks,

Susan




. summ

     Variable |       Obs        Mean    Std. Dev.       Min
       Max
-------------+--------------------------------------------------------
       xcoord |     10000    48.35506    27.67569  -1.426747
  100.8945
       ycoord |     10000    47.60003    27.35285  -.4297747
  99.15934
   cluster_id |     10000        50.5    28.86751          1
       100

. preserve

. sample 10, by(cluster_id)
(9000 observations deleted)

. summ

     Variable |       Obs        Mean    Std. Dev.       Min
       Max
-------------+--------------------------------------------------------
       xcoord |      1000    48.36142    27.67134   -.768847
  100.4548
       ycoord |      1000    47.58521    27.34904   .2530959
   98.3428
   cluster_id |      1000        50.5    28.88051          1
       100


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index