Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: wrong datetime results with clock() -- sometimes


From   Sergiy Radyakin <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: wrong datetime results with clock() -- sometimes
Date   Fri, 5 Jul 2013 20:25:13 -0400

On Fri, Jul 5, 2013 at 3:50 PM, Nick Cox <[email protected]> wrote:
> There is a very interesting meta-issue. I guess wildly that people at
> StataCorp may well have discussed this.
>
> It was not so very long ago that if you did something like
>
> gen foo = "Stata strings sweetly sing"
>
> that would be an error unless you spelled out something like
>
> gen str26 foo = ...
>
> but now Stata is smart enough to work out that the variable being
> generated must be string. So, why does not Stata see e.g. -clock()- on
> the right-hand side and automatically produce a -double-?
>
> My guess is two-fold.
>
> 1. Stata still follows an "you're an adult and take responsibility for
> what you say" attitude to -generate- and variable types.
>
> 2. The presence of -clock()- on the right-hand side does not itself
> guarantee that a -double- is needed, as in principle -clock()- might
> be part of a larger calculation for which -double-s are _not_
> required.

Checking for the presence of the clock() at the RHS is a shortcut, but
not necessarily the best or correct strategy (as you note yourself),
just as the presence of quotes in the RHS does not mean the result is
going to be a string. But checking the type of the RHS and comparing
it with the type of the LHS is a very proper way of doing programming.
In some languages you can't even compile a program unless you balance
the two or explicitly permit the loss of precision:

http://progzoo.net/wiki/C%23:Possible_Loss_of_Precision

defaulting to double in 'generate' makes much more sense to me than to
float. No memory savings can now compensate for inconvenience of
having to take care about it, and if Stata ever wants to win the
Excel-level users, the explicit notation of numerical storage type
should be completely abandoned. I can hardly imagine a task where the
user would want to specify the type of LHS directly if you remove the
storage considerations. And we know that Stata's -compress- works, so
efficiency can be gained at save-time. IMHO, that's the way to go (and
that's where SPSS is (and was) with its double-only numeric variable
type). Perhaps even before you were writing "gen str26=" you were
writing "gen double x=" taking care that the machine executing the
program was equipped with a math co-processor which could handle the
operations with such a beast as IEEE-754 numbers, but do we have a
reason to worry now? Interestingly however too much precision can also
bite back, so you never know where the progress will take you:

http://docwiki.embarcadero.com/RADStudio/XE4/en/W1066_Lost_Extended_floating_point_precision._Reduced_to_Double_%28Delphi%29

Best, Sergiy

>
> Nick
> [email protected]
>
>
> On 5 July 2013 20:36, Rebecca Pope <[email protected]> wrote:
>> Austin,
>> Thanks. That was precisely the problem. I made an assumption about the
>> variable being automatically generated in double format. It turns out
>> that StataCorp has not yet implemented mind-reading.
>>
>> Feeling very foolish now,
>> Rebecca
>>
>> On Fri, Jul 5, 2013 at 2:28 PM, Austin Nichols <[email protected]> wrote:
>>> Rebecca Pope <[email protected]>:
>>> Try a double.
>>>
>>> clear all
>>> input str21 timestr
>>> " 2/8/2011 9:50:51 PM"
>>> "2/12/2011 4:15:40 PM"
>>> "2/11/2011 3:26:12 PM"
>>> "5/15/2011 9:46:41 AM"
>>> "5/20/2011 8:32:28 PM"
>>> " 2/7/2011 2:15:40 PM"
>>> "5/25/2011 7:07:57 PM"
>>> " 5/9/2011 3:00:42 PM"
>>> "5/22/2011 3:24:57 PM"
>>> " 5/9/2011 7:09:46 PM"
>>> end
>>> gen dt=clock(timestr,"MDYhms")
>>> gen double dt2=clock(timestr,"MDYhms")
>>> format dt dt2 %tc
>>> li
>>>
>>> On Fri, Jul 5, 2013 at 3:05 PM, Rebecca Pope <[email protected]> wrote:
>>>> Hello,
>>>> I am trying to convert a variable with datetime observations currently
>>>> stored as string to a numeric format. Here is a sample of my data
>>>> after issuing these command:
>>>>
>>>> gen datetime = clock(timestr,"MDYhms")
>>>> format datetime %tc
>>>>
>>>> . list timestr datetime in 1/10, noobs clean
>>>>                  timestr             datetime
>>>>      2/8/2011 9:50:51 PM   08feb2011 21:50:37
>>>>     2/12/2011 4:15:40 PM   12feb2011 16:14:48
>>>>     2/11/2011 3:26:12 PM   11feb2011 15:27:08
>>>>     5/15/2011 9:46:41 AM   15may2011 09:46:59
>>>>     5/20/2011 8:32:28 PM   20may2011 20:31:39
>>>>      2/7/2011 2:15:40 PM   07feb2011 14:16:37
>>>>     5/25/2011 7:07:57 PM   25may2011 19:08:51
>>>>      5/9/2011 3:00:42 PM   09may2011 15:01:44
>>>>     5/22/2011 3:24:57 PM   22may2011 15:25:01
>>>>      5/9/2011 7:09:46 PM   09may2011 19:10:46
>>>>
>>>> As you can see, the converted values are a few seconds off from the
>>>> time stored in the string variable.
>>>>
>>>> I don't think that this is a Stata problem per se because if I convert
>>>> a single observation, the correct time is displayed:
>>>> . di %tc clock("2/8/2011 9:50:51 PM", "MDYhms")
>>>> 08feb2011 21:50:51
>>>>
>>>> That said, I'm not the first person to encounter this
>>>> (http://www.stata.com/statalist/archive/2011-10/msg00687.html).
>>>> However, I don't see that a solution/reason was ever provided.
>>>>
>>>> My best guess in the face of this was a hidden character in the
>>>> variable. I checked for this with the following:
>>>> replace timestr = subinstr(" "+timestr+" "," ","-",.) in 1/10
>>>>                    timestr
>>>>      -2/8/2011-9:50:51-PM-
>>>>     -2/12/2011-4:15:40-PM-
>>>>     -2/11/2011-3:26:12-PM-
>>>>     -5/15/2011-9:46:41-AM-
>>>>     -5/20/2011-8:32:28-PM-
>>>>      -2/7/2011-2:15:40-PM-
>>>>     -5/25/2011-7:07:57-PM-
>>>>      -5/9/2011-3:00:42-PM-
>>>>     -5/22/2011-3:24:57-PM-
>>>>      -5/9/2011-7:09:46-PM-
>>>>
>>>> As you can see, the hyphens are right next to the text, so I don't
>>>> think there is anything lurking at the beginning or end of the text
>>>> that isn't displaying.
>>>>
>>>> Does anyone have other suggestions?
>>>>
>>>> Thanks,
>>>> Rebecca
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index