Phil gave an excellent answer, and I just want to add one more
detail.
If all you want to do is to extract the last three digits of
a numeric ID, you can use
mod(ID, 1000)
You may have been taught about the modulus function under the
name of remainder (or the equivalent in your first
language).
mod(21557127, 1000)
is the remainder (what is left over) after dividing
21557127 by 1000, namely 127.
Nick
n.j.cox@durham.ac.uk
Phil Schumm replied to Jian Zhang
> > I have a problem about number precision. I cann't figure out what
> > it happened. Hope that you can help me out. Thanks.
> >
> > Here is the data:
> > ID
> > 21557127
> >
> > then i run the following do file trying to extract the last three
> > digits from the ID:
> >
> > gen double temxxx=(ID/1000)
> > gen temyyy=int(temxxx)
> > gen temzzz=temxxx-temyyy
> > gen areaxxx=(temzzz*1000)
> > drop temxxx temyyy temzzz
> >
> > the generated data looks like the following:
> > ID areaxxx
> > 21557127 127
> >
> > However, when I typed: list if areaxxx==127, stata in fact listed
> > nothing!
> >
> > First I thought it may be because areaxxx is a floating-point
> > variable, so I type: list if areaxxx=float(127). However, Stata
> > listed nothing again.
> First, let me say that if all you want to do is to extract the last
> three digits of the ID, here is the way to do it:
>
> . di real(substr(string(ID,"%12.0g"),-3,.))
> 127
>
> Note that if you just use string(ID) this will not work, as string()
> uses a default format which is not wide enough for your ID
> (%12.0g is
> the default format for the long storage type, which I presume is how
> your ID variable is stored).
>
> Second, this is exactly the reason why you should not store IDs as
> numbers -- you should store them as strings instead. For
> example, if
> ID were a string variable, then extracting the last three digits
> would be even simpler:
>
> . di substr(ID,-3,.)
> 127
>
> and would be guaranteed to work no matter how long your IDs are
> (provided they are no longer than 244 characters).
>
> Finally, what happened above? The problem was indeed due to the
> error inherent in floating-point arithmetic. For example, here is
> the calculation you performed:
>
> . di %24.18f float( 1000 * float( (21557127/1000) - float( int
> (21557127/1000) ) ) )
> 127.000007629394531250
>
> which, as you can see is not equal to 127. Let's take a closer look:
>
> float( 1000 * float( (21557127/1000) - float( int
> (21557127/1000) ) ) )
>
> ---- temxxx --- ---- temxxx ---
>
> --------- temyyy
> ----------
>
> --------------------- temzzz
> ------------------------
>
> -------------------------------- areaxxx
> -----------------------------
>
> Notice how I am using the float() function to mimic the fact that,
> although you created temxxx as a double, you did not do so for the
> other intermediate variables. Now in this case, had you also
> created temzzz as a double, you would have gotten what you wanted:
>
> . assert float( 1000 * ( (21557127/1000) - float( int
> (21557127/1000) ) ) ) == 127
>
> However, as I said above, it is nearly always better to store IDs
> such as these as string variables.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/