Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Working with really large Datasets

From	Richard Williams <[email protected]>
To	[email protected], [email protected]
Subject	Re: st: Working with really large Datasets
Date	Mon, 15 Oct 2012 19:09:51 -0500

At 05:52 PM 10/15/2012, Fernando Rios Avila wrote:

Dear stata listers,
I wonder if any one here can share some experience on working with
really large datasets. I m working with a panel dataset (census type
of data) for workers and firms over time. The total number of
observations is about 70 million. I want to estimate  two way fixed
effects models, manually including dummies for regions time and
industries. However with the size of the dataset, the results become
unmanageable.
Does anyone know or can direct me to an strategy to deal with "too much data"?
I was thinking about obtaining random samples (say 5%), picking
individuals at random, and keeping them along the whole time they
appear on the sample, and then combining all the results, in a similar
fashion as it is done with Multiple Imputation datasets. But im not
sure how valid would that procedure be.
Any suggestions are welcome,
Thank you.


This FAQ by Cox & Merryman might give you some ideas.

http://www.stata.com/support/faqs/data-management/sampling-clusters/


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Working with really large Datasets
  - From: Fernando Rios Avila <[email protected]>

References:
- st: Working with really large Datasets
  - From: Fernando Rios Avila <[email protected]>

Prev by Date: Re: st: Working with really large Datasets
Next by Date: st: Unable to use "replace" on specific values (but others work ok)
Previous by thread: Re: st: Working with really large Datasets
Next by thread: Re: st: Working with really large Datasets
Index(es):
- Date
- Thread