Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identifying unique values with codebook
From 
 
Michael Mitchell <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: Identifying unique values with codebook 
Date 
 
Thu, 17 Jun 2010 13:57:21 -0700 
I agree that for the "typical" variable, that storing the value as a
-float- is not a problem. Unfortunately, I have found that people
discover that they have an "atypical" variable after the fact, after
precision has been lost due to using a "float".
But, this also arises for typical variables when making comparisons
using fractional values. For example, using the -auto- dataset, I want
to see the cars that have a gear ratio of 2.19. As shown below, it
would appear that there are not any....
. sysuse auto
(1978 Automobile Data)
. describe gear_ratio
              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------
gear_ratio      float  %6.2f                  Gear Ratio
. list  make gear_ratio if gear_ratio == 2.19, abb(30)
<No observations shown>
But, as -help data_types- tells us, we need to use the following
technique because -gear_ratio- is a float.
. list  make gear_ratio if gear_ratio == float(2.19), abb(30)
     +----------------------------+
     | make            gear_ratio |
     |----------------------------|
 12. | Cad. Eldorado         2.19 |
     +----------------------------+
Instead, I have a dataset called -auto_double- that stores
-gear_ratio- as a -double-, because when I created the dataset I had
previously -set type double-.
. use auto_double
. describe gear_ratio
              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------
gear_ratio      double %10.0g
  Now, when I look for cars with a gear ratio of 2.19, I see them
without extra effort.
. list  make gear_ratio if gear_ratio == 2.19, abb(30)
     +----------------------------+
     |          make   gear_ratio |
     |----------------------------|
 12. | Cad. Eldorado         2.19 |
     +----------------------------+
  And, if I feel that I have variables that are wastefully stored as
type -double-, I can use the -compress- command to convert variables
to a more frugal storage type, such as byte, int, or long.
  I agree that, for most variables, doubles are wasteful of space.
But, I prefer to start with a double, and then have the option to go
down to a smaller storage type, than start with a float, and be unable
to upgrade to a more precise storage type.
Best regards,
Michael Mitchell
On Thu, Jun 17, 2010 at 9:51 AM, Maarten buis <[email protected]> wrote:
> --- On Thu, 17/6/10, Michael N. Mitchell wrote:
>> It seems to me that many of the "gotchas" arise
>> from the fact that the default data type is "float"
>> instead of "double".
>
> The typical variable in a dataset contains some sort
> of measurement, and most measurements are nowhere
> near as precise to warant anything more than 2 or
> 3 digits of precision, so "float" is a perfectly
> sensible default. This leaves variables that are
> supposed to represent a unique identification
> number. Here double or long may help, but these
> too can easily become too short for those cases,
> which would then require you to switch to strings.
> So, I am not convinced about the usefulness if a
> switch of the default to double.
>
> -- Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/