Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: label / macro problem


From   Jeph Herrin <junk@spandrel.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: label / macro problem
Date   Sat, 12 Aug 2006 10:17:34 -0400

Thanks to Austin Nichols and Evan Roberts for pointing out an obvious
work around - collapsing the files, joining them, generating the encoding
from this small file, and then applying the coding to each of the original
files.

Jeph


Austin Nichols wrote:
Jeph--
Trying to conserve memory and maximize speed,
I suggest something like:

foreach F of numlist 1/84 {
use mystring using file`F', clear
bys mystr: drop if _n>1
save temp`F', replace
}
use temp1, clear
foreach F of numlist 2/84 {
append using temp`F', clear
erase temp`F'
}
erase temp1
bys mystring: drop if _n>1
encode mystring, gen(myint)
lab save myint using myint.do
foreach F of numlist 1/84 {
use mystring using file`F', clear
do myint
encode mystring, gen(myint) label(myint)
save newfile`F', replace
}

If you want to see what is going on with your local macros, just
-display- them at various points in your program, but the above skips
the use of long-winded macros.

If you acquire a new file, you can always:
use mystring using file85, clear
do myint
encode mystring, gen(myint) label(myint)
lab save myint using myint2.do

On 8/11/06, Jeph Herrin <junk@spandrel.net> wrote:
I'm using 9.2, latest update.

My programming problem is to combine a large number of large files;
approximately 84 files of 500k obs each. I only need three variables
from these files, but one of them, -mystring- is str64, which means that
as is, I can't combine these files via appending because my RAM (4GB)
runs out.

However, -mystring- only takes about 5500k different values. So
the solution I am using is to open each file, encode(mystring), save
the label, and then append all prior opened files. The values of
-mystring- are not constant over all the files - new values are added
over time, so I have to update the value labels each time I add a file.
My code looks like this :

u file1, clear
encode mystring, gen(myint)
local myintlab : value label myint
save temp, replace
foreach F of numlist 2/84 {
        u file`F', clear
        keep ID mystring
        encode mystring, gen(myint) label("`myintlab'")
        local myintlab : value label myint
        append using temp
        save temp, replace
}

This seems to work fine until a point. But after about 30 files,
*something* runs out of space, and the value label ceases to be
updated with new values; -myint- simply holds integers with no
corresponding labels. Now, I understand that 64k value label
values should be allowed, so I don't see a problem there. And
-myintlab- is just a macro holding the name of the set of value
labels. So what else could be going wrong? Or, is there another
way to do this?

NB: The close reader will note that I mention 3 variables in the preamble
but only have two in my code fragment. In fact, I *also* encode a
second string variable; it takes many fewer values, however, and
turns out fine in the end.

In particular, I would appreciate any tips on how to debug what
is happening.

cheers,
Jeph
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index