Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: memory size 7 data merging


From   Joseph Coveney <jcoveney@bigplanet.com>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Re: memory size 7 data merging
Date   Mon, 24 Sep 2007 21:44:46 -0700

Michael Blasnik wrote:

The real question is -- what do you want as a result of the merge?  Do you
want 1,300,000 observations or do you just want 100,000 observations with
matched info from the larger file.  If the latter, then use the -nokeep-
option for _merge and you should be OK.  But if you want the former, then it
seems like the resulting dataset won't fit in your allocatable Stata memory
and you will need to figure out how to make it smaller by encoding strings,
dropping variables, etc.

--------------------------------------------------------------------------------

Good suggestions.

With -nokeep-, would you need to sacrifice any -assert _merge == 3-
and -assert inlist(_merge, 1, 3)-?  I would dread not being able to take
advantage of -assert- after a -merge- with the datasets that I get handed.

Also -nokeep- might still not be enough if there are multiple observations
in the 1.3-million-observation (40-megabyte) dataset that match each
observation in the 100000-observation (53-megabyte) dataset.  I've no idea
what the original poster's two datasets are like, but if there are, say, an
average 13 observations in the first file that match each observation in the
second (if it's a look-up table, for instance), then the resulting dataset
will be in the neighborhood of three-quarters of a gigabyte even with
the -nokeep- option.

Joseph Coveney


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index