Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: selecting obs while reading in huge data set


From   "Sascha O. Becker" <sascha.becker@gmx.de>
To   statalist@hsphsun2.harvard.edu
Subject   st: selecting obs while reading in huge data set
Date   Wed, 18 Aug 2004 15:47:51 +0200

Dear stata users,

I have a huge data set A (2 GB in ASCII) with 40 mio. observations (workers) but only 10 variables. I have another data set B containing information on (a sub-set of) employers and want to select only workers from data set A that are employed in firms in data set B (firm IDs are one variable in data set A).

Since I cannot read in the whole data set A at once, I can loop over data set A, i.e.

a) read in the first 5 mio obs from A, then merge data set B, keep only relevant obs, save those
b) read in the next 5 mio obs from A, merge data set B, keep only relevant obs, save those etc

but this is extremely time-consuming.

***

Is there also a way to select observations while reading in data set A, i.e. something like

-insheet datasetA using mydata if firmid in datasetB- ?

and if yes, is it likely to be faster?

Best, Sascha

--

Dr. Sascha O. Becker
Center for Economic Studies
University of Munich
Schackstr. 4 - 80539 Munich, Germany
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index