Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: joinby command and memory issues

From   "Weichle, Thomas" <[email protected]>
To   <[email protected]>
Subject   st: joinby command and memory issues
Date   Fri, 8 Oct 2010 10:57:42 -0500

Hi Statalisters,
I'm trying to combine two datasets using the joinby command.  In the
hgb.dta dataset, it contains hemoglobin test results for individuals and
could contain multiple tests on the same day.  I'm only keeping the
study_id, test date, and test result.  This dataset is very large.  In
the epo.dta dataset, it contains individuals who receive the EPO drug
and receipt date and could contain multiple receipts for an individual.
My goal is to create all pairwise combinations between the two dates in
order to determine whether and drug receipt was within 7 days of the
hemoglobin test(s).

Doing do will create a very large dataset and I don't believe I have the
memory capacity to do so.  I 'set memory to 1000m" which appears to be
the maximum on my computer, but I receive an error.  Are there any
suggestions to be able to carry out such a large task?  If there was a
way to only include the variables for study_id and receipt date in the
epo.dta dataset, then this might free some space but I don't think
joinby allows this option.  There are over 36,000 individuals in the
epo.dta dataset and over 406,000 total epo receipts.

set memory 1000m

Current memory allocation

                    current                                 memory usage
    settable          value     description                 (1M = 1024k)
    set maxvar         5000     max. variables allowed           1.909M
    set memory         1000M    max. data space              1,000.000M
    set matsize         400     max. RHS vars in models          1.254M

. use study_id ord_date result using
"G:\ESA_Cancer\ESA_DATA\ESA_USE\hgb0209.dta", clear

. unique study_id
Number of unique values of study_id is  255317
Number of records is  7438632

. sort study_id ord_date

. describe, fullnames

Contains data from G:\ESA_Cancer\ESA_DATA\ESA_USE\hgb0209.dta
  obs:     7,438,632                          
 vars:             3                          
 size:   208,281,696 (83.0% of memory free)
              storage  display     value
variable name   type   format      label      variable label
study_id        double %12.0g                 Study ID
ord_date        long   %d                     order date
result          double %12.0g                 
Sorted by:  study_id  ord_date

. * Pairwise combinations
. joinby study_id using "G:\ESA_Cancer\ESA_DATA\ESA_USE\epo0209.dta",
no room to add more observations
    An attempt was made to increase the number of observations beyond
what is currently possible.
    You have the following alternatives:

     1.  Store your variables more efficiently; see help compress.
(Think of Stata's data area as
         the area of a rectangle; Stata can trade off width and length.)

     2.  Drop some variables or observations; see help drop.

     3.  Increase the amount of memory allocated to the data area using
the set memory command; see
         help memory.

Tom Weichle
Math Statistician
Center for Management of Complex Chronic Care (CMC3)
Hines VA Hospital, Bldg 1, C202
708-202-8387 ext. 24261
[email protected] 

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index