Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Creating a data subset with subjects chosen at random


From   Amal Khanolkar <Amal.Khanolkar@ki.se>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: Creating a data subset with subjects chosen at random
Date   Fri, 15 Jun 2012 13:15:47 +0000

Hello all,

I have a large dataset with almost 3 million observations. The following is a description of the country of origin of the subjects:

mother's country of |
              birth |      Freq.     Percent        Cum.
--------------------+-----------------------------------
             Sweden |  2,593,143       86.69       86.69
Western Europe + NA |     71,736        2.40       89.09
            Finland |    108,326        3.62       92.71
     Eastern Europe |     15,636        0.52       93.23
             Poland |     18,179        0.61       93.84
      F. Yugoslavia |     34,110        1.14       94.98
        Arab league |      8,687        0.29       95.27
               Iraq |     13,004        0.43       95.71
            Lebanon |     12,295        0.41       96.12
            Somalia |      7,122        0.24       96.36
              Syria |      9,360        0.31       96.67
             Turkey |     22,083        0.74       97.41
               Iran |     11,717        0.39       97.80
         South Asia |      9,341        0.31       98.11
   Ethiopia+Eritrea |      6,917        0.23       98.34
          East asia |     23,162        0.77       99.12
      Latin America |     10,111        0.34       99.46
              Chile |     10,512        0.35       99.81
             Africa |      5,759        0.19      100.00
--------------------+-----------------------------------
              Total |  2,991,200      100.00



- I would like to create a subset of the above dataset that consists of 1. 100,000 subjects, 2. With the same distribution of subjects by country of origin as above in the parent dataset. 3. Tell Stata to choose the subjects at random.   The dataset of course has several other variables. But I would like to define the new data subset based on the above country of origin as it is my main exposure variable.

- Any idea how I go about doing this?

Thanks!

Regards,

/Amal.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index