Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: RE: egen and spontaneously changing numbers


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: RE: RE: egen and spontaneously changing numbers
Date   Wed, 20 May 2009 13:49:44 -0400

Another short answer is Yes, if you store long numbers as strings,
generate numeric versions of them, generate another string variable
from the numeric one, and check equality of the string variables--a
crazy solution--it is far easier just to understand the precision of
various storage types ( http://www.stata.com/help.cgi?datatypes
indicates you may have to worry about these problems for any integer
over six digits ) and make sure you are not shooting yourself in the
foot.  FWIW, there are warnings all over in the help and manuals e.g.
http://www.stata.com/help.cgi?dates_and_times#sttypes
[U] 13.10 Precision and problems therein

On Wed, May 20, 2009 at 1:26 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> The short answer I believe to be No.
>
> Whenever you are asking for a -float- result, Stata will (attempt to)
> give you one. That usually -- not just occasionally -- gives you less
> precision than is possible. What differs is that almost no-one cares
> about the last few decimal places if there is a fractional part, whereas
> we may well care about preserving integers exactly.
>
> If you want a homunculus inside the machine smart enough to work out
> that you are asking for something you don't really want, you may have to
> program it yourself, as only you know what you don't really want.
>
> More positively, these issues most commonly arise with long numeric
> identifiers, in which case there are two simple pieces of advice.
>
> 1. Keep all long numeric identifiers in string variables.
>
> 2. If you have a good reason not to do that, make sure that results are
> always specified as -long- or -double-.
>
> By the way, you can specify the default new variable type as e.g.
> -double-. (Specifying it as -long- would create more problems than it
> solved.)
>
> Nick
> n.j.cox@durham.ac.uk
>
> sdm1/Steve
>
> Is there any way I can get Stata to warn me that it is doing this?
>
> Nick Cox
>
> You have a precision problem. By default -egen- will generate -float-
> variables with the functions you are using. To keep every digit in the
> integers you are playing with you need to spell out that you want a
> -long- or -double-. There aren't enough bits in the variable type you
> are
> using.
>
> I can't follow your code which seems to go back and forth between string
> and
> numeric results, nor do I know what MEPS means. I guess there's a much
> simpler way to do what you want without using -egen- at all, but the
> issue
> that is biting you is illustrated thus:
>
> . set obs 1
> obs was 0, now 1
>
> . gen long myin  = 40002015
>
> . egen myout = max(myin)
>
> . egen long myout2 = max(myin)
>
> . format myout* %12.0f
>
> . l
>
>     +--------------------------------+
>     |     myin      myout     myout2 |
>     |--------------------------------|
>  1. | 40002015   40002016   40002015 |
>     +--------------------------------+
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index