# RE: st: Re: infix problem?

 From "Nick Cox" To Subject RE: st: Re: infix problem? Date Wed, 14 May 2008 17:02:36 +0100

```There are two distinct but related issues here.

One is the maximum (or minimum) size of number any type can hold.

The other is whether large integers can be held exactly, i.e. every digit is correct.

You can hold numbers up to almost 9 * 10^307, but that does not mean that you can hold 307 digit integers exactly.

All numbers are held as binaries. Usually that means a binary approximation, even for numbers like 1.1. For moderate integers, exact binary representations are possible, but that does not extend to all integers.

Nick
n.j.cox@durham.ac.uk

Hau Chyi

Thanks a lot for clarifying this. I will use string to deal with this
issue from now on.

If double can't deal with 16 digits number, what does the help mean by
saying it can store up to 8.9884656743*10^307?

Also, most data surveys generate do files for users now. Before this
incident to be honest, I've never thought about checking the ascii and
do files provided by the websites (now I realized it's a good and
necessary practice).

I am a recent subscriber to the list. In a relatively short period of
time, I've already learned a lot (to the point that I am going to
offer a class using your book as a reference textbook, hehe..).

On Wed, May 14, 2008 at 9:10 AM, Kit Baum <baum@bc.edu> wrote:
> But even with a double declaration you cannot read 16 digits and retain them
> all. This is beyond the capability of a double data type. You should use a
> string variable type to deal with a 16-character ID code properly, as is
> often discussed on this list. Even if an ID is numeric, there is no downside
> to treating it as string, and will ensure that this kind of problem does not
> bite you.

> On May 14, 2008, at 02:33 , Nick wrote:
>
>>
>> This is arguable. The help of -infix- does indicate that if you want a
>> -double- you need to specify that, so Stata is putting the onus on you to
>> think about variable types.
>> Otherwise put, your punishment is that you got what you asked for.
>>
>> Despite that, the idea that Stata should be smart on your behalf is
>> naturally attractive. Quite what that would mean with -infix- is not clear
>> except to Stata developers who know the exact algorithm. In particular, a
>> decision on optimal variable types presumably implies two passes through the
>> data, i.e. the field width is not enough to decide.
>
>> Hau Chyi
>>
>> I've downloaded several variables from the SIPP (Study of Income and
>> Program Participation), and realized there seems to be a problem with
>> the -infix- command, which I hope can be illustrated by the following
>> example.
>>
>> Here is only one observation with one variable, which looks like below
>> in the asc file.
>>
>> 1234567890123456
>>
>> This is the SSUID, the survey unit id of each individual.
>>
>> If you save this into a asc file as, say "d:\documents\test\test.asc"
>> , and run the following lines:
>>
>>    infix SSUID 1-16 using "d:\documents\test\test.asc";
>>    format SSUID  %16.0f;
>>
>> and then:
>>
>> - -list SSUID-
>>
>> The variable Stata reads is:
>>     +------------------+
>>     |            SSUID |
>>     |------------------|
>>  1. | 1234567948140544 |
>>     +------------------+
>>
>>
>> It's completely wrong!  I realize this after discovering some families
>> I generated from SSUID (and other family identifyers) have more than
>> 100 kids!!
>> The problem disappears when I do
>>
>> - -infix double SSUID 1-16 using ... -
>>
>> In other words, the precision -infix- chooses automatically is wrong.
>>
>> Is this a bug of infix or some memory allocation error of my computer?
>> No matter what, I recommend if you are infixing variables with more
>> than 10 digits, you'd better check the ascii file to see if it's truly
>> correct.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```