Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identifying unique values with codebook


From   Michael Mitchell <Michael.Norman.Mitchell@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Identifying unique values with codebook
Date   Thu, 17 Jun 2010 13:57:21 -0700

I agree that for the "typical" variable, that storing the value as a
-float- is not a problem. Unfortunately, I have found that people
discover that they have an "atypical" variable after the fact, after
precision has been lost due to using a "float".

But, this also arises for typical variables when making comparisons
using fractional values. For example, using the -auto- dataset, I want
to see the cars that have a gear ratio of 2.19. As shown below, it
would appear that there are not any....

. sysuse auto
(1978 Automobile Data)
. describe gear_ratio

              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------
gear_ratio      float  %6.2f                  Gear Ratio

. list  make gear_ratio if gear_ratio == 2.19, abb(30)

<No observations shown>

But, as -help data_types- tells us, we need to use the following
technique because -gear_ratio- is a float.

. list  make gear_ratio if gear_ratio == float(2.19), abb(30)

     +----------------------------+
     | make            gear_ratio |
     |----------------------------|
 12. | Cad. Eldorado         2.19 |
     +----------------------------+

Instead, I have a dataset called -auto_double- that stores
-gear_ratio- as a -double-, because when I created the dataset I had
previously -set type double-.

. use auto_double
. describe gear_ratio

              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------
gear_ratio      double %10.0g

  Now, when I look for cars with a gear ratio of 2.19, I see them
without extra effort.

. list  make gear_ratio if gear_ratio == 2.19, abb(30)

     +----------------------------+
     |          make   gear_ratio |
     |----------------------------|
 12. | Cad. Eldorado         2.19 |
     +----------------------------+

  And, if I feel that I have variables that are wastefully stored as
type -double-, I can use the -compress- command to convert variables
to a more frugal storage type, such as byte, int, or long.

  I agree that, for most variables, doubles are wasteful of space.
But, I prefer to start with a double, and then have the option to go
down to a smaller storage type, than start with a float, and be unable
to upgrade to a more precise storage type.

Best regards,

Michael Mitchell



On Thu, Jun 17, 2010 at 9:51 AM, Maarten buis <maartenbuis@yahoo.co.uk> wrote:
> --- On Thu, 17/6/10, Michael N. Mitchell wrote:
>> It seems to me that many of the "gotchas" arise
>> from the fact that the default data type is "float"
>> instead of "double".
>
> The typical variable in a dataset contains some sort
> of measurement, and most measurements are nowhere
> near as precise to warant anything more than 2 or
> 3 digits of precision, so "float" is a perfectly
> sensible default. This leaves variables that are
> supposed to represent a unique identification
> number. Here double or long may help, but these
> too can easily become too short for those cases,
> which would then require you to switch to strings.
> So, I am not convinced about the usefulness if a
> switch of the default to double.
>
> -- Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index