Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: -replace- should not be use with temporary files (was: Comparing datasets)


From   "Sergiy Radyakin" <[email protected]>
To   [email protected]
Subject   Re: st: RE: -replace- should not be use with temporary files (was: Comparing datasets)
Date   Thu, 18 Sep 2008 16:52:10 -0400

Sometimes it is necessary to replace a tempfile. In a situation like
the following:

//-------------------------------------------------------------------------------------------------------
log using mypaper.txt, text replace
tempfile all_years
use data1990, clear
do prepare_year
save "`all_years'" /* we don't write replace here. Stata must ensure
that the file does not exist before I create it, otherwise it is not a
tempname */
foreach year in 1991 1992 1993 1994 1995 1996 1997 1998 1999 {
    use data`year', clear
    do prepare_year
    append using "`all_years'"
    save "`all_years'", replace
}
do my_paper
/* tempfile is deleted - no data is left after the file terminates,
except for the log */
log close
//-------------------------------------------------------------------------------------------------------

Note that it is not optimal: it is probably better to create tempfiles
for each year, then append them all one by one. But that requires you
creating many temporary names and looping through them in the program:

//-------------------------------------------------------------------------------------------------------
log using mypaper.txt, text replace
foreach year in 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 {
    use data`year', clear
    do prepare_year
    tempfile data`year'
    save "`data`year''" /* we do not replace any tempfile in this version */
}
/* we now have 10 files on the disk. note that the last one (1999) is
currently open */
foreach year in 1990 1991 1992 1993 1994 1995 1996 1997 1998 {
  append using "`data`year''"
}
do my_paper
log close
//-------------------------------------------------------------------------------------------------------

Regarding "number of tempfiles can be anything you want", strictly
speaking you may hit a limit, but practically this is not relevant.
The limit will depend on your OS and disk system (FAT16/FAT32/NTFS).
E.g. if the tempfolder is located on a disk with FAT16 you will hit
the limit at "just 512 files" :smile:

source:http://ask-leo.com/is_there_a_limit_to_what_a_single_folder_or_directory_can_hold.html

and also here:
http://en.wikipedia.org/wiki/File_Allocation_Table

This is not a very probable situation, because FAT16 is almost
universally replaced since about Win98 times, but there may be similar
limits in other OSes. Also tempfolders notoriously accumulate junk, so
you can hit 65,534 limit of FAT32 even if you create a dozen of
tempfiles in Stata, given that your computer runs for years without a
cleanup and crashes often, living tonns of tempfiles (e.g. mine
contains more than 12,000 today).

AFAIK nobody has ever reported a problem creating a tempfile in Stata
because of the number of files in a folder limit. But I think this
year somebody hit the limit on the number of simultaneously opened
files - 2048.

If you want to make sure you don't destroy your data, it is more
reliable to set "read-only" attribute for those files, because there
are plenty of other opportunities to destroy your data. Stata (correct
me if I am wrong) _never_ modifies read-only files in any way (whether
that is a data file, program file or a log file). AFAIK any OS allows
to mark a file as "read-only".

Finally, you may want to read Bill Gould's detailed explanations on
how Stata manages tempfiles here:
http://www.stata.com/statalist/archive/2007-08/msg01124.html and
compute yourself what is the max number of tempfiles that can be
created within one instance of Stata.

Best regards,
   Sergiy Radyakin

/* it's nice to see that Stata people are back after Ike. How bad was it? */

On Thu, Sep 18, 2008 at 1:22 PM, Rajesh Tharyan <[email protected]> wrote:
> Hi,
>
> When would one want to save replace a temp file. Given that, it will get
> erased at the end of the run? One can just create as many as needed is it
> not? Or would that be inefficient vis-�-vis usage of system resources?
>
> Rajesh
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Steven Samuels
> Sent: 18 September 2008 18:08
> To: [email protected]
> Subject: Re: st: RE: -replace- should not be use with temporary files (was:
> Comparing datasets)
>
> Joseph Coveny wrote:
>> This is news to me.  I use -replace- all of the time with temporary
>> files. What did StataCorp technical support say was the matter with
>> using
>> -save . . . , replace- with temporary files?
>>
>>> Steven Samuels wrote (excerpted):
>>>
>>> . . . Technical support told me that "replace" should not be used
>>> when
>>> saving temporary files.
>>>
>
> Joseph
>
> What happened was--I tried to save a temporary file `t2',  without
> first defining it  Stata did not issue an error message, and I had no
> clue as to where I'd gone wrong. (My only excuse-I was tired.)  Kerry
> Kammire of StataCorp pointed out my error and went on to say that
> there were actually two syntax errors.
>
> "The second syntax error -save, replace- prevented Stata from issuing
> an error
> when `t2' is undefined. The -replace- option shouldn't be needed when
> using
> temporary files because they are freshly created each time the
> procedure is
> run."
>
> Thus -replace- was unnecessary and, in this case, harmful.
>
> -Steve
>
> On Sep 18, 2008, at 12:22 PM, Nick Cox wrote:
>
>> This came up on the list a while back.
>>
>> Suppose you mistype the local macro reference. Say you mean to type
>>
>> save `myfile', replace
>>
>> but you have a minute brainstorm and you type `myfil'. Further suppose
>> that local macro `myfil' is not defined. Then Stata sees
>>
>> save, replace
>>
>> which to Stata is perfectly legal and intelligible. Stata will
>> overwrite
>> the original data file, which is not what you intended at all. Of
>> course, typos here and there can have all sorts of consequences,
>> all of
>> which are strictly your fault, but this one could be catastrophic if
>> what you had in memory was only a small part of the data or nothing to
>> do with the dataset you last read in.
>>
>> There may be other reasons for not doing this, but that's one.
>>
>> Nick
>> [email protected]
>>
>> Joseph Coveney
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index