Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: selecting obs while reading in huge data set

From   "Steve Stillman" <[email protected]>
To   <[email protected]>
Subject   RE: st: selecting obs while reading in huge data set
Date   Thu, 19 Aug 2004 20:15:53 +1200

I have a recommendation that I wouldn't usually make.  I have been
recently doing work with matched employer-employee data with over 30
million obs, so we have been running into the same problem as you.  SAS
is much better for large dataset merges than Stata.  In particular, proc
SQL is remarkably fast at doing these types of merges (likely because
SQL is written with this type of operation in mind).

Well there is was, likely the last time I will recommend SAS over Stata.


-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Sascha O.
Sent: Thursday, August 19, 2004 6:59 PM
To: [email protected]
Subject: Re: st: selecting obs while reading in huge data set

Dear Daniel,

thanks for your reply!

You suggested:

Perhaps you can read the employee and firm ID only?

    .insheet empid firmid using mydata

This is only 1/5th the variables, so it might fit in your computer
Then merge the result with the firm dataset, keeping only matched
records, then merge again with employee dataset, keeping only matched

This last step is actually identical to the original problem. "The
employee dataset" is the full dataset with all variables. In order to
merge this to anything, it needs to be in memory at least once, and this

is exactly the problem.

There seems to be no way round some kind of looping, either over
observations, or over subsets of variables that I would merge against
the firm data set and then append/merge those sub-datasets.

Cheers, Sascha
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index