Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating a smaller dataset from a larger one.


From   Le Wang <statauser@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Creating a smaller dataset from a larger one.
Date   Mon, 13 Aug 2012 11:47:42 -0400

Dear Amal,

Stata has a built-in program called -sample- to draw a random sample.
See the link below for a detailed tutorial for this command.

http://www.ats.ucla.edu/stat/stata/faq/sample.htm

Hope that helps.

Le


On Mon, Aug 13, 2012 at 10:31 AM, Amal Khanolkar <Amal.Khanolkar@ki.se> wrote:
> Hello all,
>
> I have a very large dataset with almost 3 million subjects - great to work with, but however a bit difficult to transport or carry with me. I prefer to create a smaller sub-dataset with say 100,000 subjects chosen at random. As I'm interested in studying ethnic differences, I use the variable 'Motherland' that denotes country of birth in the code below to help create my sub-dataset. However, the code I'm currently using, I get (I think) the first 100,000 subjects, which is then not at random. How may I change the code below, to choose 100,000 (or say any number I wish) subjects at random?
>
> I use the following code to create a subset of my original dataset:
>
> *Creating a subsample of the dataset with say 100,000 subjects*
>
> // create random variable
> gen x = runiform()
>
> // sort by country and x
> sort motherland x
>
> // create a variable within country identifying the first 10% (change this proprtion as you wish)
>
> by motherland: gen subsamp = _n <= (_N+0.5)*0.10
>
> tab motherland subsamp, col
>
> tab motherland kon, col, if magecat!=. & education!=. & famsit_new!=. & smoke1!=. & parity!=. & zscore_gest!=. & MBMI2!=. & mlangd!=. & multibirth==2 & subsamp==1
>
>
> Thanks for any help,
>
> /Amal.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 

~~~~~~~~~~~~~~~~~~~~~~~~
Le Wang, Ph.D
Assistant Professor
Department of Economics
University of New Hampshire

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index