Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: -replace- should not be use with temporary files (was: Comparing datasets)


From   "Sergiy Radyakin" <[email protected]>
To   [email protected]
Subject   Re: st: RE: -replace- should not be use with temporary files (was: Comparing datasets)
Date   Thu, 18 Sep 2008 19:05:59 -0400

On Thu, Sep 18, 2008 at 6:39 PM, Steven Samuels
<[email protected]> wrote:
> -Sergiy---
>
> What happens if you omit the "replace' in your first example?
>
> -Steve

Well this happens:

. sysuse auto
(1978 Automobile Data)

. tempfile file

. save "`file'"
file R:\TEMP\ST_1x000001.tmp saved

. save "`file'"
file R:\TEMP\ST_1x000001.tmp already exists
r(602);


A tempfile is just a file. Stata remembers that it must be deleted (I
imagine it has a small red post-it note saying "reminder: delete this
and that" :) , but in all other respects it is (all Stata commands
treat it as) a regular file. Technically all transactions to file are
committed and file is closed, which is very convenient, since
sometimes it is necessary to allow other programs to modify those
files. For comparison: Word often creates zero-length temporary files
that remain open through the session and get deleted later (in cases
when Word does not crash).

The code in my example is not actual, I just cooked it for
illustration purposes. But that is one rare of my examples that not
only may happen in theory, but simply may happen (as opposed to having
1.5 billion temporary files). There is actually nothing unrealistic in
constructing a dataset in such a way (1) as opposed to (2).

Best regards,
   Sergiy Radyakin

If you like puzzles: how many Stata's do I have running now, if the
temporary file name is ST_1x000001.tmp ? Can you be sure that the
result that you get is correct? or is it just an estimate? If it is an
estimate, is it an upper bound? lower bound? guess? etc. Refer to
Mr.Gould's email for hints.



> On Sep 18, 2008, at 4:52 PM, Sergiy Radyakin wrote:
>
>> Sometimes it is necessary to replace a tempfile. In a situation like
>> the following:
>>
>>
>> //-------------------------------------------------------------------------------------------------------
>> log using mypaper.txt, text replace
>> tempfile all_years
>> use data1990, clear
>> do prepare_year
>> save "`all_years'" /* we don't write replace here. Stata must ensure
>> that the file does not exist before I create it, otherwise it is not a
>> tempname */
>> foreach year in 1991 1992 1993 1994 1995 1996 1997 1998 1999 {
>>    use data`year', clear
>>    do prepare_year
>>    append using "`all_years'"
>>    save "`all_years'", replace
>> }
>> do my_paper
>> /* tempfile is deleted - no data is left after the file terminates,
>> except for the log */
>> log close
>>
>> //-------------------------------------------------------------------------------------------------------
>>
>> Note that it is not optimal: it is probably better to create tempfiles
>> for each year, then append them all one by one. But that requires you
>> creating many temporary names and looping through them in the program:
>>
>>
>> //-------------------------------------------------------------------------------------------------------
>> log using mypaper.txt, text replace
>> foreach year in 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 {
>>    use data`year', clear
>>    do prepare_year
>>    tempfile data`year'
>>    save "`data`year''" /* we do not replace any tempfile in this version
>> */
>> }
>> /* we now have 10 files on the disk. note that the last one (1999) is
>> currently open */
>> foreach year in 1990 1991 1992 1993 1994 1995 1996 1997 1998 {
>>  append using "`data`year''"
>> }
>> do my_paper
>> log close
>>
>> //-------------------------------------------------------------------------------------------------------
>>
>> Regarding "number of tempfiles can be anything you want", strictly
>> speaking you may hit a limit, but practically this is not relevant.
>> The limit will depend on your OS and disk system (FAT16/FAT32/NTFS).
>> E.g. if the tempfolder is located on a disk with FAT16 you will hit
>> the limit at "just 512 files" :smile:
>>
>>
>> source:http://ask-leo.com/is_there_a_limit_to_what_a_single_folder_or_directory_can_hold.html
>>
>> and also here:
>> http://en.wikipedia.org/wiki/File_Allocation_Table
>>
>> This is not a very probable situation, because FAT16 is almost
>> universally replaced since about Win98 times, but there may be similar
>> limits in other OSes. Also tempfolders notoriously accumulate junk, so
>> you can hit 65,534 limit of FAT32 even if you create a dozen of
>> tempfiles in Stata, given that your computer runs for years without a
>> cleanup and crashes often, living tonns of tempfiles (e.g. mine
>> contains more than 12,000 today).
>>
>> AFAIK nobody has ever reported a problem creating a tempfile in Stata
>> because of the number of files in a folder limit. But I think this
>> year somebody hit the limit on the number of simultaneously opened
>> files - 2048.
>>
>> If you want to make sure you don't destroy your data, it is more
>> reliable to set "read-only" attribute for those files, because there
>> are plenty of other opportunities to destroy your data. Stata (correct
>> me if I am wrong) _never_ modifies read-only files in any way (whether
>> that is a data file, program file or a log file). AFAIK any OS allows
>> to mark a file as "read-only".
>>
>> Finally, you may want to read Bill Gould's detailed explanations on
>> how Stata manages tempfiles here:
>> http://www.stata.com/statalist/archive/2007-08/msg01124.html and
>> compute yourself what is the max number of tempfiles that can be
>> created within one instance of Stata.
>>
>> Best regards,
>>   Sergiy Radyakin
>>
>> /* it's nice to see that Stata people are back after Ike. How bad was it?
>> */
>>
>> On Thu, Sep 18, 2008 at 1:22 PM, Rajesh Tharyan <[email protected]>
>> wrote:
>>>
>>> Hi,
>>>
>>> When would one want to save replace a temp file. Given that, it will get
>>> erased at the end of the run? One can just create as many as needed is it
>>> not? Or would that be inefficient vis-�-vis usage of system resources?
>>>
>>> Rajesh
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Steven Samuels
>>> Sent: 18 September 2008 18:08
>>> To: [email protected]
>>> Subject: Re: st: RE: -replace- should not be use with temporary files
>>> (was:
>>> Comparing datasets)
>>>
>>> Joseph Coveny wrote:
>>>>
>>>> This is news to me.  I use -replace- all of the time with temporary
>>>> files. What did StataCorp technical support say was the matter with
>>>> using
>>>> -save . . . , replace- with temporary files?
>>>>
>>>>> Steven Samuels wrote (excerpted):
>>>>>
>>>>> . . . Technical support told me that "replace" should not be used
>>>>> when
>>>>> saving temporary files.
>>>>>
>>>
>>> Joseph
>>>
>>> What happened was--I tried to save a temporary file `t2',  without
>>> first defining it  Stata did not issue an error message, and I had no
>>> clue as to where I'd gone wrong. (My only excuse-I was tired.)  Kerry
>>> Kammire of StataCorp pointed out my error and went on to say that
>>> there were actually two syntax errors.
>>>
>>> "The second syntax error -save, replace- prevented Stata from issuing
>>> an error
>>> when `t2' is undefined. The -replace- option shouldn't be needed when
>>> using
>>> temporary files because they are freshly created each time the
>>> procedure is
>>> run."
>>>
>>> Thus -replace- was unnecessary and, in this case, harmful.
>>>
>>> -Steve
>>>
>>> On Sep 18, 2008, at 12:22 PM, Nick Cox wrote:
>>>
>>>> This came up on the list a while back.
>>>>
>>>> Suppose you mistype the local macro reference. Say you mean to type
>>>>
>>>> save `myfile', replace
>>>>
>>>> but you have a minute brainstorm and you type `myfil'. Further suppose
>>>> that local macro `myfil' is not defined. Then Stata sees
>>>>
>>>> save, replace
>>>>
>>>> which to Stata is perfectly legal and intelligible. Stata will
>>>> overwrite
>>>> the original data file, which is not what you intended at all. Of
>>>> course, typos here and there can have all sorts of consequences,
>>>> all of
>>>> which are strictly your fault, but this one could be catastrophic if
>>>> what you had in memory was only a small part of the data or nothing to
>>>> do with the dataset you last read in.
>>>>
>>>> There may be other reasons for not doing this, but that's one.
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>> Joseph Coveney
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index