Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Rounding Errors Stata 12


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Rounding Errors Stata 12
Date   Wed, 13 Feb 2013 09:28:49 -0600

Well, sometimes when I generate a set of 500 bootstrap weights, I
would store them as floats for the memory reasons, as they double the
size of the data set. But still... we are not exchanging floppies
anymore. And THE BIG DATA that everybody talks about usually comes in
text forms, anyway.

On Wed, Feb 13, 2013 at 8:56 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> Stas' view is not universal.
>
> I don't know how big your numbers go, but using -double- can only
> lessen the apparent problem, not fix it.
>
> I'd predict from what you have told us that holding such data as
> -float- will be entirely unproblematic
> except for the one strange detail that you started with, that
> occasionally such numbers will be displayed with
> what look like spurious decimal places.
>
> Nick
>
> On Wed, Feb 13, 2013 at 2:47 PM, Gray, Charles <gray.c@east.ei.com> wrote:
>> Yeah I've read about data types and storage. Here is what the Stata help
>> window says:
>>
>> "Stata stores numbers in binary, and this has a second effect on numbers
>> less than 1.  1/10 has no perfect binary representation just as 1/11 has
>> no perfect decimal representation.  In float, .1 is stored as
>> .10000000149011612.  Note that there are 7 digits of accuracy, just as
>> with numbers larger than 1.  Stata, however, performs all calculations
>> in double precision.  If you were to store 0.1 in a float called x and
>> then ask, say, "list if x==.1", there would be nothing in the list.  The
>> .1 that you just typed was converted to double, with 16 digits of
>> accuracy (.100000000000000014...), and that number is never equal to 0.1
>> stored with float accuracy.
>>
>> "One solution is to type "list if x==float(.1)".  The float() function
>> rounds its argument to float accuracy; see [D] functions.  The other
>> alternative would
>> be store your data as double, but this is probably a waste of memory.
>> Few people have data that is accurate to 1 part in 10 to the 7th.  Among
>> the
>> exceptions are banks, who keep records accurate to the penny on amounts
>> of billions of dollars.  If you are dealing with such financial data,
>> store your
>> dollar amounts as doubles.  See float()."
>>
>> I guess I just don't follow why .8 is stored as .801. My concern with
>> storing the variable as a double is memory usage. I've got several other
>> datasets that I'll be appending onto this one and I'm just not sure I'll
>> be able to store the variable as a double once I've appended all of the
>> datasets onto each other.
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Stas
>> Kolenikov
>> Sent: Wednesday, February 13, 2013 9:38 AM
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: Rounding Errors Stata 12
>>
>> Read about -help data types- and -insheet- your data as -double-s. The
>> default -float- does not have enough accuracy. (Add it to the wish list
>> for Stata 13: make -double- the default type, or get rid of -float-s
>> whatsoever; I have -set type double- in my profile.do, and only generate
>> -float- variables when I need to reproduce somebody's round-off errors.)
>>
>> --
>> -- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
>> -- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
>> srbi dot com
>> -- Opinions stated in this email are mine only, and do not reflect the
>> position of my employer
>>
>>
>> On Wed, Feb 13, 2013 at 8:25 AM, Gray, Charles <gray.c@east.ei.com>
>> wrote:
>>> I am having an issue with Stata 12 adding decimal places to data that
>> I
>>> insheet. I simply have a dataset in .csv format. The dataset contains
>> a
>>> variable 'item_revenue.' When I open the dataset in Excel, several
>>> observations have a value of 60237.8 for the 'item_revenue' variable.
>>> However, when I insheet the dataset into Stata, these values change to
>>> 60237.801. My insheet command is simply,
>>>
>>> insheet using "data.csv", comma clear
>>>
>>> My understanding is that the .csv format saves only the text and
>> values
>>> as they are displayed in cells of the active worksheet. So does anyone
>>> know why Stata would add decimal places to a variable?
>>>
>>> Thanks,
>>>
>>> Charlie
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
-- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
-- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
srbi dot com
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index