Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: best practice for dates and times
Nick Cox <email@example.com>
RE: st: best practice for dates and times
Mon, 21 Feb 2011 18:29:35 +0000
As I understand it, -replace- looks at values, not what produced them. If it sees a particular number that should be stored as -double-, it is not because it comes tagged with a history "produced by -clock()-; act accordingly".
This follows at least from the fact that -replace- will work with an extraordinarily large number of different expressions as its argument. Do you think that there is code in there that monitors what expression produced the values and acts differently according to the expression used? Now I am not looking at Stata's source code, you are not looking at Stata's source code, but I still bet on my story -- unless and until someone from StataCorp tells me that it's wrong.
Please note in previous code I had intentionally set type to float. -replace- still choose double for results of -clock-. Suppose I do another calculation, -replace- will choose float. Therefore, -replace- must know about -clock-.
From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Nick Cox
Sent: Monday, February 21, 2011 12:09 PM
Subject: RE: st: best practice for dates and times
True. But it remains the case that -replace- knows nothing about -clock()-. It looks at the values and works out what the type should be, within limits.
The inhibition about promoting from -float- to -double- I think follows from the rest of what Stata does here, as otherwise many if not most -float-s would morph into -double-s sooner or later, which, modulo views expressed earlier, would not match most intentions.
If I understand Nick correctly, the functions will always do what they do, with double precision. It's the -gen- command finally did the trick to record numbers inaccurately. I expanded on Joseph's findings:
set type float
set obs 1
gen byte x=1
replace x=clock("1may2001 12:34","DMY hm")
Interestingly, -replace- does convert the variable from byte (you can try any other numeric types other than float) to double, not float. However, if the variable is float to begin with, it will not be promoted to double.
-replace- command is much smarter than -gen-. If you operate with date, it started with byte, it will figure out the most efficient way to store the data: when int is sufficient, it will not choose long, or if the date is large, it will choose long, not float or double.
I note these evident inconsistencies as questions for StataCorp.
Otherwise, we need to be clear here about the difference between functions such as -clock()- and commands such as -generate-.
I don't think this request matches the way functions work. They just take inputs (in most cases) and produce outputs. I don't think that there is any sense in which they know about any wider context, as for example being part of a -generate- or -replace- command. Part of the magic of computing follows from very strong division of labour like this.
I think what you and Junlin are asking is that -generate- (and -replace-) be smarter when the expression they are fed contains a call to e.g. -clock()-. That's a different request. My only bet is that this is much trickier than it seems, in principle as well as in practice. For example, even if -clock()- is part of the expression, it doesn't always follow that the user wants a -double- variable as a result. The spirit of the request is understandable, but I think there is a slippery slope ahead.
I'm inclined to agree with Junlin on this. The user has to supply a mask for the date-time data as the second argument. It seems to me that the function should be able to determine from that information whether double-precision is needed, whether single-precision will be adequate, and even when integer will suffice.
Stata is not entirely consistent in automatically detecting and setting the proper storage type. For example, back on the earlier thread, typing -generate y = 83085733- won't set the data type to the necessary precision, but opening the data editor and pasting 83085733 into the first cell and closing
*does* set the data type to long.
And, typing -generate float y = 1- and then -replace y = 83085733- won't automatically promote the data type to accommodate the larger value. On the other hand, first entering -generate byte y = 1- and then -replace y = 83085733-
Junlin Liao wrote:
. . . I thought -clock- or -Clock- would be smart enough to figure it out by
itself. . . . I think there again is a chance for Stata to get smart.
* For searches and help try: