Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating a smaller dataset from a larger one.

From	Maarten Buis <[email protected]>
To	[email protected]
Subject	Re: st: Creating a smaller dataset from a larger one.
Date	Mon, 13 Aug 2012 20:01:55 +0200

On Mon, Aug 13, 2012 at 4:31 PM, Amal Khanolkar wrote:
> I have a very large dataset with almost 3 million subjects - great to work with, but however a bit difficult to transport or carry with me. I prefer to create a smaller sub-dataset with say 100,000 subjects chosen at random.

Alternatively, you could select the variables you want to keep and use
-contract-. For each unique combination of these variables it keeps
only one observation but records how many observations that represents
in a new variable _freq. You can than add the -[fw=_freq]- statement
to all subsequent commands, and thus keep all the information from
your original dataset. If your variables are all categorical the
reduction in size (and speed up of execution of commands) can be
spectacular. However, even with continuous variables the save can be
considerable, as continuous variables are hardly ever as continuous as
we think.

Another way to reduce the size of a dataset without loosing
information is -compress-.

Hope this helps,
Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Creating a smaller dataset from a larger one.
  - From: Amal Khanolkar <[email protected]>

Prev by Date: RE: st: Re overlaying normal curves over multiple histograms
Next by Date: Re: st: Creating a smaller dataset from a larger one.
Previous by thread: Re: st: Creating a smaller dataset from a larger one.
Next by thread: st: trouble with -mi predict- in Stata 12
Index(es):
- Date
- Thread