[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: selecting obs while reading in huge data set

From	"Steve Stillman" <[email protected]>
To	<[email protected]>
Subject	RE: st: selecting obs while reading in huge data set
Date	Thu, 19 Aug 2004 20:15:53 +1200

Sascha,
I have a recommendation that I wouldn't usually make.  I have been
recently doing work with matched employer-employee data with over 30
million obs, so we have been running into the same problem as you.  SAS
is much better for large dataset merges than Stata.  In particular, proc
SQL is remarkably fast at doing these types of merges (likely because
SQL is written with this type of operation in mind).

Well there is was, likely the last time I will recommend SAS over Stata.

Cheers,
Steve

-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Sascha O.
Becker
Sent: Thursday, August 19, 2004 6:59 PM
To: [email protected]
Subject: Re: st: selecting obs while reading in huge data set


Dear Daniel,

thanks for your reply!

You suggested:

****
Perhaps you can read the employee and firm ID only?

    .insheet empid firmid using mydata

This is only 1/5th the variables, so it might fit in your computer
memory.
Then merge the result with the firm dataset, keeping only matched
records, then merge again with employee dataset, keeping only matched
records.
****

This last step is actually identical to the original problem. "The
employee dataset" is the full dataset with all variables. In order to
merge this to anything, it needs to be in memory at least once, and this

is exactly the problem.

There seems to be no way round some kind of looping, either over
observations, or over subsets of variables that I would merge against
the firm data set and then append/merge those sub-datasets.

Cheers, Sascha
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: selecting obs while reading in huge data set
  - From: Michael Ingre <[email protected]>

Prev by Date: Re: st: question on GLLAMM
Next by Date: Re: st: Re: graph prediction of transformed variable from regression
Previous by thread: Re: st: selecting obs while reading in huge data set
Next by thread: Re: st: selecting obs while reading in huge data set
Index(es):
- Date
- Thread