Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating a smaller dataset from a larger one.

From	Richard Williams <[email protected]>
To	[email protected], [email protected]
Subject	Re: st: Creating a smaller dataset from a larger one.
Date	Mon, 13 Aug 2012 16:04:00 -0500

At 10:47 AM 8/13/2012, Le Wang wrote:

Dear Amal,

Stata has a built-in program called -sample- to draw a random sample.
See the link below for a detailed tutorial for this command.

http://www.ats.ucla.edu/stat/stata/faq/sample.htm

Hope that helps.

Le

I'll add a caution here -- if the data are -svyset-, I don't thinkyou are supposed to create extracts. Stata needs all the cases inorder to get the standard errors right. I've never fully understoodwhy, but Statalist has had various threads explaining why you shoulduse -subpop- rather than -if- for selecting cases (and presumably thesame logic applies to extracts).

On Mon, Aug 13, 2012 at 10:31 AM, Amal Khanolkar <[email protected]> wrote:
> Hello all,
>
> I have a very large dataset with almost 3 million subjects -great to work with, but however a bit difficult to transport orcarry with me. I prefer to create a smaller sub-dataset with say100,000 subjects chosen at random. As I'm interested in studyingethnic differences, I use the variable 'Motherland' that denotescountry of birth in the code below to help create my sub-dataset.However, the code I'm currently using, I get (I think) the first100,000 subjects, which is then not at random. How may I change thecode below, to choose 100,000 (or say any number I wish) subjects at random?
>
> I use the following code to create a subset of my original dataset:
>
> *Creating a subsample of the dataset with say 100,000 subjects*
>
> // create random variable
> gen x = runiform()
>
> // sort by country and x
> sort motherland x
>
> // create a variable within country identifying the first 10%(change this proprtion as you wish)
>
> by motherland: gen subsamp = _n <= (_N+0.5)*0.10
>
> tab motherland subsamp, col
>
> tab motherland kon, col, if magecat!=. & education!=. &famsit_new!=. & smoke1!=. & parity!=. & zscore_gest!=. & MBMI2!=. &mlangd!=. & multibirth==2 & subsamp==1
>
>
> Thanks for any help,
>
> /Amal.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



--

~~~~~~~~~~~~~~~~~~~~~~~~
Le Wang, Ph.D
Assistant Professor
Department of Economics
University of New Hampshire

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Creating a smaller dataset from a larger one.
  - From: Amal Khanolkar <[email protected]>
- Re: st: Creating a smaller dataset from a larger one.
  - From: Le Wang <[email protected]>

Prev by Date: Re: st: Creating a smaller dataset from a larger one.
Next by Date: st: ado file help
Previous by thread: Re: st: Creating a smaller dataset from a larger one.
Next by thread: Re: st: Creating a smaller dataset from a larger one.
Index(es):
- Date
- Thread