Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: precision and ID numbers


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: precision and ID numbers
Date   Thu, 16 Jan 2003 13:04:18 -0000

Stephen Mennemeyer
>
> The Stata 7 Manual U 16.10 briefly mentions that some
> problems can arise
> due to the fact that STATA stores numbers in single precision and
> estimates them in double precision.
>
> I have found another situation where double precision is
> required in
> STATA for some numbers that  would seem to be safely
> manipulated with
> single precisions.
>
> Consider a typical -- hypothetical --Social Security number
> stored as a
> string variable.
> 123-45-6789
>
> On might wish to convert this to a numeric variable for easier
> manipulation such as doing sorts in a more appealing manner.
>
> If one parses the SSN into its numeric components,
> multiplies them up
> to the appropriate scale and   then adds them back
> together, the result
> is a bit surprising.  This process is better done in double
> precsion to
> get the expected result.
>
> list ssnchar /* from an external file of hypothtical SSNs*/
>
>          ssnchar
>   1. 123-45-6789
>   2. 987-65-4321
>   3. 078-94-5612
>   4. 321-65-7894
>   5. 978-54-6231
>
> /* parse the string variable ssnchar into its component
> parts, multiply
> them up to the appropriate position in the future number
> and then add
> the parts  */
> . gen double p1=real(substr(ssnchar,1,3))
> . gen double p2=real(substr(ssnchar,5,2))
> . gen double p3=real(substr(ssnchar,8,4))
> . gen double ssndbl=p1*1000000+p2*10000+p3
> . format ssndbl %9.0f
>
> /* with double precision the results are as expected  */
> . list
>          ssnchar          p1          p2          p3     ssndbl
>   1. 123-45-6789         123          45        6789  123456789
>   2. 987-65-4321         987          65        4321  987654321
>   3. 078-94-5612          78          94        5612   78945612
>   4. 321-65-7894         321          65        7894  321657894
>   5. 978-54-6231         978          54        6231  978546231
>
> /* if we use the float form of the number , the resulting
> variable ssnf
> is not what might be anticipated */
> . gen  p1f=real(substr(ssnchar,1,3))
> . gen  p2f=real(substr(ssnchar,5,2))
> . gen  p3f=real(substr(ssnchar,8,4))
> . gen ssnf=p1f*1000000+p2f*10000+p3f
>
> . format ssnf %9.0f
> . list
> . list ssn*
>          ssnchar     ssndbl       ssnf
>   1. 123-45-6789  123456789  123456792
>   2. 987-65-4321  987654321  987654336
>   3. 078-94-5612   78945612   78945616
>   4. 321-65-7894  321657894  321657888
>   5. 978-54-6231  978546231  978546240
>

The question of precision when handling large integers
has been raised many times on this list over the years.

This example is another useful warning of how things
can go wrong. I add a few comments and a question.

1. In addition to the manual, there is a fairly detailed
discussion of holding numbers as compared with holding
strings in

On numbers and strings. Stata Journal 2(3):314--329
(2002)

which on the whole sings the praises of holding
identifiers like (United States) Social Security
Numbers as strings.

2. What drives this is the need to hold every
digit exactly. For large integers with 9 digits,
the -long- data type should be fine.

3. Splitting and recombining is not the easiest
solution here, even if you want the nearest
numeric equivalent of strings like "123-45-6789".

One better way to do it is

. destring ssnchar, gen(ssn) ignore("-")

and another is

. gen long SSN = real(subinstr(ssnchar,"-","",.))

The second has the advantage that the format is
automatically sensible. Both take one line
and avoid the creation of separate variables
for the parts.

4. In this particular case, I don't understand what
the problem is with sort order when the variable
is held as string. Please elaborate.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index