Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Problem with adding 3 numbers


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Problem with adding 3 numbers
Date   Sun, 18 Oct 2009 16:14:50 +0100

This difficulty is often flagged on this list. For example, see the thread that started only a few weeks ago with 

<http://www.stata.com/statalist/archive/2009-09/msg00927.html> 

This thread is especially relevant as the underlying problem is the same, handling composite identifiers formed by concatenation. 

A more general point is that one can miss valuable stuff by treating Statalist as write-only. 

I don't fully agree with Stas about the default. Setting -double- as the default avoids certain problems only to create others, notably inefficiency and storage. Stas is of course perfectly at liberty to change the default for his purposes, but that doesn't make -double- necessarily a good default for all users. 

Nick 
n.j.cox@durham.ac.uk 

Stas Kolenikov

Read on -help datatypes- to figure out the relative accuracy of the
stored numbers. The default -float- type (which is a terrible default
if you ask me) stores numbers with about 4e-8 relative accuracy. Your
multiplication by 1e7 produces results accurate to the -state- level
only; your localities are well below the round-off error for this
type. I have forgotten about these troubles ages ago after putting a
line

set type double

into my profile.do file in Stata directory.

On the other hand, a -double- type variable still makes a rather poor
identifier, so you might want to -generate- your compound ID variables
as -long-:

gen long claveloc =((state*10000000)+ (mun2*10000))+loc2

Again, make sure you are still able to store all the numbers
accurately, and the largest ID you could ever need does not exceed
~2bln:

. di %12.0g c(maxlong)
  2147483620

If you have fewer than 214 states, you should be good to go with -long- :))

On Fri, Oct 16, 2009 at 9:27 PM, Kanter, Rebecca <rkanter@jhsph.edu> wrote:

I am trying to add three numbers (1-2 digit code for state plus 3 digit code for municipality + 4 digit code for locality). together unique for each state in a country. I have tried this various ways and each time, after the 1st state, STATA starts to round (I think) some of the numbers. I have tried this numerous ways. No state, municipality, or locality are missing. State is byte. Municipality and locality are strings (that I convert to numeric see below).
>
> gen munloc=mun+loc
> destring munloc, generate(test)
> generate ent2=ent*10000000
> generate claveloc=ent2+test
>
> or whereby:
> mun2=real(mun)
> loc2=real(loc)
> gen claveloc =((state*10000000)+ (mun2*10000))+loc2
>
> ****Anyway I try this, I get problems like this:
>
> state      mun   loc             munloc  test        state2             claveloc
>      2         001     0001    0010001 10001   20000000        20010000 (should be 20010001)
>   ent  mun      loc        munloc              test               state2                claveloc
>    2             001 0139      0010139 10139   20000000        20010140 (should be 20010139)
>
> *Should be like this:
>
> state             mun loc                 munloc        test                  state2    claveloc
> 1                     001       0001    0010001 10001   10000000        10010001
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index