Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Creating a data subset with subjects chosen at random


From   Amal Khanolkar <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Creating a data subset with subjects chosen at random
Date   Fri, 15 Jun 2012 13:15:47 +0000

Hello all,

I have a large dataset with almost 3 million observations. The following is a description of the country of origin of the subjects:

mother's country of |
              birth |      Freq.     Percent        Cum.
--------------------+-----------------------------------
             Sweden |  2,593,143       86.69       86.69
Western Europe + NA |     71,736        2.40       89.09
            Finland |    108,326        3.62       92.71
     Eastern Europe |     15,636        0.52       93.23
             Poland |     18,179        0.61       93.84
      F. Yugoslavia |     34,110        1.14       94.98
        Arab league |      8,687        0.29       95.27
               Iraq |     13,004        0.43       95.71
            Lebanon |     12,295        0.41       96.12
            Somalia |      7,122        0.24       96.36
              Syria |      9,360        0.31       96.67
             Turkey |     22,083        0.74       97.41
               Iran |     11,717        0.39       97.80
         South Asia |      9,341        0.31       98.11
   Ethiopia+Eritrea |      6,917        0.23       98.34
          East asia |     23,162        0.77       99.12
      Latin America |     10,111        0.34       99.46
              Chile |     10,512        0.35       99.81
             Africa |      5,759        0.19      100.00
--------------------+-----------------------------------
              Total |  2,991,200      100.00



- I would like to create a subset of the above dataset that consists of 1. 100,000 subjects, 2. With the same distribution of subjects by country of origin as above in the parent dataset. 3. Tell Stata to choose the subjects at random.   The dataset of course has several other variables. But I would like to define the new data subset based on the above country of origin as it is my main exposure variable.

- Any idea how I go about doing this?

Thanks!

Regards,

/Amal.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index