Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Selecting part of a LARGE file


From   "Jesper B. Sorensen" <sorensen@mit.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Selecting part of a LARGE file
Date   Fri, 06 Jun 2003 13:10:41 -0400

Glenn,

I don't have an elegant solution to this, but in my experience, a lot of what slows things down in these kinds of cases is memory constraints. To get all the data into memory, Stata likely has to use swap memory which brings things to a crawl.

So perhaps a way to speed things up is to split the big dataset into smaller files and then doing sequential merges. The files would have to be small enough that you could do the merge in RAM. There is a fixed cost involved with splitting up the big file, but if you might ever need to go back this would probably be worth incurring.

//Jesper


At 10:47 AM 6/6/2003 -0500, you wrote:

Hi all.

I have two files.  File A has about 5000 unique values of the variable
PATENT, which is 7 characters long.  File B has 16 million observations
and several million unique values for PATENT.  I want to do some
manipulation involving File B, but only for the observations that
correspond to the patent values found in File A.   I am currently using
merge on the two files to do this (actually mmerge as a wrapper for
ease), but wonder if there is an easier/faster way.

I attempted using vallist.ado in File A to generate a long local macro
(say, _useme) and then doing

        use FileB if index(patent, "'useme'")

I get 0 observations in this case (even though I know there are some
matches).  From the manual, it appears that index is limited to strings
of 80 characters, anyway.

Any better suggestions?  Thank you in advance.

Glenn

Glenn Hoetker
Assistant Professor of Strategy
College of Business Administration
University of Illinois at Urbana-Champaign
217-265-4081
ghoetker@uiuc.edu
"Success is going from failure to failure without a loss of enthusiasm."
Sir Winston Churchill

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jesper B. Sørensen
Richard S. Leghorn (1939) Associate Professor of Strategic Management
Sloan School of Management
Massachusetts Institute of Technology
E52-581
Cambridge, MA 02142
http://web.mit.edu/sorensen/www/
(617) 253 7945  -- voice
(617) 253 2660  -- fax


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index