Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Creating a smaller dataset from a larger one.

From	Amal Khanolkar <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: Creating a smaller dataset from a larger one.
Date	Mon, 13 Aug 2012 14:31:30 +0000

Hello all,

I have a very large dataset with almost 3 million subjects - great to work with, but however a bit difficult to transport or carry with me. I prefer to create a smaller sub-dataset with say 100,000 subjects chosen at random. As I'm interested in studying ethnic differences, I use the variable 'Motherland' that denotes country of birth in the code below to help create my sub-dataset. However, the code I'm currently using, I get (I think) the first 100,000 subjects, which is then not at random. How may I change the code below, to choose 100,000 (or say any number I wish) subjects at random?

I use the following code to create a subset of my original dataset:

*Creating a subsample of the dataset with say 100,000 subjects*

// create random variable
gen x = runiform()

// sort by country and x
sort motherland x

// create a variable within country identifying the first 10% (change this proprtion as you wish)

by motherland: gen subsamp = _n <= (_N+0.5)*0.10

tab motherland subsamp, col

tab motherland kon, col, if magecat!=. & education!=. & famsit_new!=. & smoke1!=. & parity!=. & zscore_gest!=. & MBMI2!=. & mlangd!=. & multibirth==2 & subsamp==1


Thanks for any help,

/Amal.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Creating a smaller dataset from a larger one.
  - From: Maarten Buis <[email protected]>
- Re: st: Creating a smaller dataset from a larger one.
  - From: Le Wang <[email protected]>

Prev by Date: st: RE: Re: IV test statistics in Stata
Next by Date: st: trouble with -mi predict- in Stata 12
Previous by thread: st: Marginal effect of interaction (continuos) variable in multinomial logit model
Next by thread: Re: st: Creating a smaller dataset from a larger one.
Index(es):
- Date
- Thread