Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: tempfile already exists


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: tempfile already exists
Date   Thu, 30 Aug 2007 09:23:38 -0500

After my posting about how tempfiles are named, 
Jeph Herrin <junk@spandrel.net> wrote, 

> One oddity: base-34?

and then followed up, 

> I know well what base-34 means, but I have never
> seen used before. It seems a strange choice - even
> base-36 makes more sense.

I have no good explanation for our choice of base 34 over, say, base 32
or even base 36.

First, some of you may be wondering why any base other than base 10.  In base
34, the top digit is "X".  We use six base-34 digits in a tempfile name, so
that largest number, XXXXXX, equals 1,544,804,415 in base 10.  Thus we are
able to record in six characters what would take ten characters in base 10.
That was important in the old days of the 8.3 filenaming convention.

Now, had the programmer (that would be me) been thinking clearly, he would
have chosen base 32 because one can convert from base 2 to base 32 more
quickly than one can convert from base 2 to base 34.  Base 2 is important
because that is how numbers are actually stored inside the computer.  Anyway,
for base 2 to 34 conversion, there's a trick:  take the base-2 number 
in 5-digit groups and separately translate each group into a single base-32
digit.  Each digit can be translated separately!  For instance, the base-10
number 42 in base 2 is 101010.  To totranslate to base 32, start by noting
that 101010 in 5-bit groups is 1-01010, where "-" is just a dash, not a minus.
1 base 2 translates to 1 base 32.  01010 base 2 translates to "A" base 32.
Thus, the base-32 equivalent is 1A.

It's rather fun to prove that, given two bases that are powers of each
other, one can translate between them a digit at a time.

Jeph Herrin <junk@spandrel.net> wrote "even base 36 makes more sense".  
Jeph is right.  Even though, binarywise, there is nothing special about 
36, base 36 would have used all the letters, and then the largest number 
would have been ZZZZZZ, equal to 2,176,782,335 in base 10.

Interestingly, someone at Stata noted the conceptual mitake in the use of base
34 for making tempfile names because, the routine that makes tempory variable
names indeed uses base 36:  temporary variable names in stata are of the form
two underscores followed by a six-digit base-36 number.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index