help dates and times
-------------------------------------------------------------------------------
Title
[D] dates and times -- Date and time (%t) values and variables
Syntax
Syntax is presented under the following headings:
How Stata records dates and times
Inputting date and time data
Recommended storage types for %t variables
Typing dates and times
Constructing date and time values from numerical components
Converting date and time values
Extracting date and time components
Obtaining and working with durations
Formatting date and time values
Click here to skip to the Description and Remarks.
How Stata records dates and times
Dates and times are called %t values. %t values are numerical and
integral. The integral value records the number of time units that
have passed from an agreed-upon base, which for Stata is 1960.
Coding and interpretation of date and time (%t) values are as
follows:
+---------------------------------------------------------------------+
| | | ----- Numerical value & interpretation ------ |
| Format | Meaning | Value = -1 | Value = 0 | Value = 1 |
|--------+------------+---------------+---------------+---------------|
| %tc | clock | 31dec1959 | 01jan1960 | 01jan1960 |
| | | 23:59:59.999 | 00:00:00.000 | 00:00:00.001 |
| | | | | |
| %td | days | 31dec1959 | 01jan1960 | 02jan1960 |
| | | | | |
| %tw | weeks | 1959w52 | 1960w1 | 1960w2 |
| | | | | |
| %tm | months | 1959m12 | 1960m1 | 1960m2 |
| | | | | |
| %tq | quarters | 1959q4 | 1960q1 | 1960q2 |
| | | | | |
| %th | half-years | 1959h2 | 1960h1 | 1960h2 |
| | | | | |
| %tg | generic | -1 | 0 | 1 |
+---------------------------------------------------------------------+
Explanation: The middle, bolded column shows the base value. For a
%td value, 0 means 01jan1960. The table also shows that -1 means
31dec1959 and 1 means 02jan1960. A %td value records the number of
days from 01jan1960; a %tc value records the number of milliseconds
from the start of 01jan1960; a %tw value records the number of weeks
from the first week of 1960; and so on.
That is,
o For a %tc value, a 1-unit change represents 1 ms.
Integer 394,839,482,000 represents 05jul1972 21:38:02.000
because that date occurred 394,839,482,000 ms after 01jan1960
00:00:00.000.
Integer -394,839,482,000 represents 28jun1947 02:21:58.000
because that date occurred 394,839,482,000 ms before
01jan1960 00:00:00.000.
o For a %td value, a 1-unit change represents 1 day.
Integer 4,569 represents 05jul1972 because that date occurred
4,569 days after 01jan1960.
Integer -4,569 represents 29jun1947 because that date
occurred 4,569 days before 01jan1960.
o For a %tw value, a 1-unit change represents 1 week.
Integer 650 represents 1972w27 because that date occurred 650
weeks after 1960w1.
Integer -650 represents 1947w27 because that date occurred
650 weeks before 1960w1.
o For a %tm value, a 1-unit change represents 1 month.
Integer 150 represents 1972m7 because that date occurred 150
calendar months after 1960m1.
Integer -150 represents 1947m7 because that date occurred 150
calendar months before 1960m1.
o For a %tq value, a 1-unit change represents one quarter (3
calendar months).
Integer 50 represents 1972q3 because that date occurred 50
quarters after 1960q1.
Integer -50 represents 1947q3 because that date occurred 50
quarters before 1960q1.
o For a %th value, a 1-unit change represents one half-year, or 6
months.
Integer 25 represents 1972h2 because that date occurred 25
half-years after 1960h1.
Integer -25 represents 1947h2 because that date occurred 25
half-years before 1960h1.
o For a %tg value, a 1-unit change represents whatever you wish.
Integer 100 might represent 100 workdays, or 100 lunar
months, or anything else, after some agreed-upon event, such
as 01jan1960, or the date you were born, or anything else.
Negative values would represent times before the event.
In addition to the above, there is %ty:
+---------------------------------------------------------------------+
| | | ------ Numerical value & interpretation ------ |
| Format | Meaning | 1959 | 1960 | 1961 |
|--------+-----------+---------------+---------------+----------------|
| %ty | year | 1959 | 1960 | 1961 |
+---------------------------------------------------------------------+
A %ty value is like the other %t values except that, rather than the
base being 1960, the base is 0 AD. (Years 0100 through 9999 are
valid.)
In addition to the above, there is %tC:
+---------------------------------------------------------------------+
| | | ------ Numerical value & interpretation ------ |
| Format | Meaning | -1 | 0 | 1 |
|--------+-----------+---------------+---------------+----------------|
| %tC | clock | 31dec1959 | 01jan1960 | 01jan1960 |
| | | 23:59:59.999 | 00:00:00.000 | 00:00:00.001 |
| | | | | |
+---------------------------------------------------------------------+
%tC is similar to %tc, except that %tC accounts for leap seconds:
Remember that %tc integer 394,839,482,000 represents 05jul1972
21:38:02.000.
That integer in %tC represents 05jul1972 21:38:01.000. For those
who wish their clock based on astronomical observation, 1 leap
second was inserted. (The first leap second was on 30jun1972,
the second on 31dec1972, and others have been inserted since
then.) See Advice on using %tc and %tC under Remarks below.
Jargon: A %td value is sometimes called an elapsed date.
Historical note: A %td value is sometimes referred to as a %d value.
The t is omitted because, in Stata's history, %d values predated the
other %t values. Dropping the t is still allowed but is now
considered an anachronism.
Inputting date and time data
Date and time variables are best read as strings. Use one of the
string-to-numeric conversion functions to convert the string
representation to the appropriate %t value:
Format | String-to-numeric conversion function
-------+-----------------------------------------
%tc | clock(string, mask)
%tC | Clock(string, mask)
|
%td | date(string, mask)
|
%tw | weekly(string, mask)
%tm | monthly(string, mask)
%tq | quarterly(string, mask)
%th | halfyearly(string, mask)
%ty | yearly(string, mask)
|
%tg | no function necessary; read as numeric
-------------------------------------------------
In the above functions, string is the variable or value containing
the string representation to be converted and mask specifies the
order in which the components occur:
o For %td function date(), string might be "August 21, 2005" or
"8-21-2005" and mask might be "MDY", meaning that the elements
occur in the order month, day, and year.
o For %tc function clock(), string might be "21aug2005 15:21:22"
and mask might be "DMYhms", meaning that the elements occur in
the order day, month, year, hours, minutes, and seconds.
Thus one might code
. generate datehired = date(datehiredstr, "MDY")
. generate double timeadmitted = clock(timeadmitstr, "DMYhms")
See String-to-numeric translation functions under Remarks for
details.
Recommended storage types for %t variables
In the example above, we stored %tc variable timeadmitted as a
double. Doing so is important if precision is to be maintained.
The recommended storage types for %t variables are
Format | Recommended storage type
-------+--------------------------
%tc | double
%tC | double
|
%td | float or long
|
%tw | float or int
%tm | float or int
%tq | float or int
%th | float or int
%ty | float or int
%tg | float or int
----------------------------------
Storing a %tc (%tC) variable as a double is important if precision is
to be maintained. %tc variables are integers, but being the number
of milliseconds from the start of 1960, they are large integers.
o What happens if you store a %tc value as a float?
The largest integer that can be stored precisely in a float is
16,777,216, corresponding to 01jan1960 04:39:37.216. Times after
that will be subject to rounding; the rounding as of recent times
can be as much as 2 minutes, 11 seconds.
o What happens if you store a %tc value as a long?
The largest integer that can be stored in a long is
2,147,483,620, corresponding to 25jan1960 20:31:23.620. Times
after that cannot be stored in a long.
o What happens if you store a %tc value as a double?
The largest integer that can be stored precisely in a double is
9,007,199,254,740,992, corresponding to a date in year
285,422,880. Stata cuts off dates at year 9999, but for other
reasons.
(In the above, we use an idiosyncratic definition of "precisely":
positive value x is stored precisely if x MINUS 1 is not equal to x,
where MINUS is the computer's operation of subtraction. For float and
double, there are larger values that are stored exactly, but not
precisely. For example, both float and double can exactly store the
integer 2^100, a value approximately equal to 1.3e+30, but 2^100 MINUS 1
is still 2^100 because of loss of precision.)
+-----------------------------------------------+
| DO NOT FORGET |
| |
| %tc and %tC values MUST BE stored as doubles. |
| Doing so is your responsibility, not Stata's. |
+-----------------------------------------------+
Typing dates and times
Remember, date and time values are just integers, so in an
expression, you could type the appropriate integer:
. gen before = cond(hiredon < 16237, 1, 0) if hiredon < .
. drop if admittedon < 1402920000000
Easier to type is
. gen before = cond(hiredon < td(15jun2004), 1, 0) if hiredon < .
. drop if admittedon < tc(15jun2004 12:00:00)
td() and tc() are called pseudofunctions because they translate what
you type into their integer equivalents. Pseudofunctions require
only that you specify the date/time components in the expected order,
so rather than 15jun2004 above, we could have specified 15 June 2004,
15-6-2004, or 15/6/2004.
The date and time pseudofunctions and their expected component order
are
Format | Pseudofunction
-------+--------------------------------------------------
%tc | tc([day-month-year] hh:mm[:ss[.sss]])
%tC | tC([day-month-year] hh:mm[:ss[.sss]])
|
%td | td(day-month-year)
|
%tw | tw(year-week)
%tm | tm(year-month)
%tq | tq(year-quarter)
%th | th(year-half)
%ty | none necessary; just type year
%tg | none necessary
----------------------------------------------------------
The day-month-year in tc() and tC() are optional. If you omit them,
01jan1960 is assumed. Doing so produces time as an offset, which can
be useful in, for example,
. gen six_hrs_later = eventtime + tc(6:00)
Also see Extracting date and time components below.
Historical note: Pseudofunctions td(), tw(), tm(), tq(), and th()
used to be called d(), w(), m(), q(), and h(). Those names still
work but are considered anachronisms.
Constructing date and time values from numerical components
If you had numeric variables M, D, and Y containing month number, day
of month, and year (in the first observation, the variables might
contain 12, 15, and 2006), you could code
. generate mydate = mdy(M, D, Y)
to obtain a new %td variable containing the date (which would be
15dec2006 in the first observation).
The date-from-numerical-components functions are
Format | Function
-------+------------------------------------------
%tc | mdyhms(M, D, Y, h, m, s)
%tc | dhms(td, h, m, s)
%tc | hms(h, m, s)
|
%tC | Cmdyhms(M, D, Y, h, m, s)
%tC | Cdhms(td, h, m, s)
%tC | Chms(h, m, s)
|
%td | mdy(M, D, Y)
|
%tw | yw(Y, W)
%tm | ym(Y, M)
%tq | yq(Y, Q)
%th | yh(Y, H)
%ty | Y
--------------------------------------------------
where
td is a %td value,
M, D, and Y are month, day, and year values,
1 <= M <= 12
1 <= D <= 31
0100 <= Y <= 9999
h, m, and s are hour, minute, and second values,
0 <= h <= 23
0 <= m <= 59
0.000 <= s <= 59.999 (see note below)
W is a week number, 1 <= W <= 52
Q is a quarter number, 1 <= Q <= 4
H is a half number, 1 <= H <= 2
Note concerning s: The Cmdyhms() and Cdhms() functions allow 0.000
<= s <= 60.999 when the 60th second is a leap second. For instance,
according to the authorities, 31dec1972 23:59:60 is an official leap
second but 31dec1971 23:59:60 is not. Cmdyhms(12,31,1971,23,59,60)
therefore evaluates to missing (.) whereas
Cmdyhms(12,31,1972,23,59,60) evaluates to 410,313,601,000, a
nonmissing value. (The expanded range of s does not apply to Chms()
because it is a pure time based on 01jan1960 and there were no leap
seconds on that date. The hms() and Chms() functions are, in fact,
identical.)
The mdyhms() and dhms() functions are related by
mdyhms(M, D, Y, h, m, s) = dhms(mdy(M,D,Y), h, m, s)
and, similarly,
Cmdyhms(M, D, Y, h, m, s) = Cdhms(mdy(M,D,Y), h, m, s)
With mdyhms(), you have six variables, such as M=7, D=5, Y=1972,
h=21, m=38, and s=2, and mdyhms() returns 05jul1972 21:38:02. With
dhms() you have four variables, the first specifying the %td value of
05jul1972, and h, m, and s being the same, and dhms() returns the
date + time, 05jul1972 21:38:02.
Converting date and time values
One type of %t value can be converted into another. The functions
are
To...
From | %tc %tC %td %tw %tm %tq
-----+----------------------------------------------------
%tc | Cofc() dofc()
%tC | cofC() dofC()
|
%td | cofd() Cofd() wofd() mofd() qofd()
|
%tw | dofw()
%tm | dofm()
%tq | dofq()
%th | dofh()
%ty | dofy()
----------------------------------------------------------
To...
From | %th %ty
-----+-----------------
%tc |
%tC |
|
%td | hofd() yofd()
|
%tw |
%tm |
%tq |
%th |
%ty |
-----------------------
For instance, to convert a %td to a %tc value,
. generate double datetimevalue = cofd(datevalue)
%td is the mother of all date and time values, and to convert a %tq
value to a %tc value, you must first convert to a %td value:
. generate double datetimevalue = cofd(dofq(quartervalue))
Extracting date and time components
Let d be a %td variable or value. The following functions will
extract components of d:
-------------------------------------------------------------
Result if d = td(05jul1972)
Function Returns (i.e., d = 4,569)
-------------------------------------------------------------
year(d) calendar year 1972
month(d) calendar month 7
day(d) day within month 5
doy(d) day of year 187
halfyear(d) half of year 2
quarter(d) quarter 3
week(d) week within year 27
dow(d) day of week (0 = Sunday) 3 (means Wednesday)
-------------------------------------------------------------
Remember, any %t value can be converted to a %td value by using the
appropriate conversion function; see Converting date and time values
above. If the date_time_admitted variable is %tc and you want to
obtain the day of week,
. gen day = dow(dofc(date_time_admitted))
Let t be a %tc variable. The following functions will extract
components of t:
----------------------------------------------------------------
Result if t = tc(05jul1972-21:38:02)
Function Returns (i.e., t = 394,839,482,000)
----------------------------------------------------------------
hh(t) time of day, hours 21
mm(t) time of day, minutes 38
ss(t) time of day, seconds 2.000
----------------------------------------------------------------
Other components can be extracted by calculating dofc(t) and then
extracting components from the %td value.
Let T be a %tC variable. The following functions will extract
components of T:
----------------------------------------------------------------
Result if T = tC(05jul1972-21:38:01)
Function Returns (i.e., T = 394,839,482,000)
----------------------------------------------------------------
hhC(T) time of day, hours 21
mmC(T) time of day, minutes 38
ssC(T) time of day, seconds 1.000
----------------------------------------------------------------
By convention, leap seconds came after 23:59:59 and are labeled
23:59:60. Thus ssC(T) can return 60.
Other components can be extracted by calculating dofC(T) and then
extracting components from the %td value.
Obtaining and working with durations
Remember that %t variables are simply durations from 1960:
Format | Units
-------+-------------
%tC | milliseconds
%tc | milliseconds
|
%td | days
|
%tw | weeks
%tm | months
%tq | quarters
%th | half-years
---------------------
Thus, to obtain the duration between %t variables, subtract them:
. gen days_employed = curdate - hiredate
. gen qtrs_to_15jan = curqtr - qofd(td(15jan2005))
To add a duration to a date, add the two values:
. gen lastdate = hiredate + days_employed
. format lastdate %td
. gen qtr_of_merger = curqtr + quarters_to_merger
. format qtr_of_merger %tq
When creating new date and time variables, remember to format them so
that they will be readable should you print them.
The above applies equally to %tc and %tC variables:
. gen double millisecs_employed = lasttime - hiretime
and
. gen double lasttime = hiretime + millisecs_employed
. format lasttime %tc
Note our use of double. Times are recorded in milliseconds and must
be stored as doubles if precision is to be maintained.
There are 1,000 ms in a second, 60*1,000 in a minute, and 60*60*1,000
in an hour. It is easy to mistype these constants when converting to
more readable units, and therefore the following functions are
provided:
Function | Purpose
---------------+----------------------------------
hours(ms) | convert milliseconds to hours
| returns ms/(60*60*1000)
|
minutes(ms) | convert milliseconds to minutes
| returns ms/(60*1000)
|
seconds(ms) | convert milliseconds to seconds
| returns ms/1000
|
msofhours(h) | convert hours to milliseconds
| returns h*60*60*1000
|
msofminutes(m) | convert minutes to milliseconds
| returns m*60*1000
|
msofseconds(s) | convert seconds to milliseconds
| returns s*1000
--------------------------------------------------
Thus you can code
. gen double days_employed = 24*hours(lasttime-hiretime)
and
. gen double lasttime = hiretime + msofhours(24*days_employed)
If precision is to be preserved, the use of these functions does not
alleviate the necessity of using doubles.
days_employed in the above will include fraction of a day. If a
rounded integer result is desired, then round explicitly:
. gen approx_days_employed = round(24*hours(lasttime-hiretime))
Formatting date and time values
A variable's values are formatted to indicate 1) the units used and
2) how the variable is to be displayed:
. generate mydate = date(datestr, "DMY")
. list mydate in 1
+--------+
| mydate |
|--------|
1. | 17096 |
+--------+
. format mydate %td
. list mydate in 1
+-----------+
| mydate |
|-----------|
1. | 22oct2006 |
+-----------+
. generate double mytime = clock(timestr, "DMY hm")
. list mytime in 1
+-----------+
| mytime |
|-----------|
1. | 1.477e+12 |
+-----------+
. format mytime %tc
. list mytime in 1
+--------------------+
| mytime |
|--------------------|
1. | 22oct2006 13:02:00 |
+--------------------+
The %t formats result in the following output:
Format | Example of output
-------+----------------------------
%tC | 05jul1972 21:38:01
%tc | 05jul1972 21:38:02
|
%td | 05jul1972
|
%tw | 1972w27
%tm | 1972m7
%tq | 1972q3
%th | 1972h2
%ty | 1972
%tg | (actual integer shown)
------------------------------------
Formats %tC and %tc do not
show the milliseconds by default.
You can specify how dates and times are to be formatted. Rather than
05jul1972, you could have July 5, 1972, or rather than 05jul1972
21:38:02, you could have 7-5-72 9:38 p.m. This reformatting is done
by adding codes to the end of %tC, %tc, %td, etc. In fact, the
default %tC, %tc, %td, ..., formats actually mean
Format | Implied (fully specified) format
-------+---------------------------------
%tC | %tCDDmonCCYY_HH:MM:SS
%tc | %tcDDmonCCYY_HH:MM:SS
|
%td | %tdDDmonCCYY
|
%tw | %twCCYY!www
%tm | %tmCCYY!mnn
%th | %thCCYY!hh
%ty | %tyCCYY
-----------------------------------------
Typing
. format mytimevar %tc
has the same effect as typing
. format mytimevar %tcDDmonCCYY_HH:MM:SS
Format %tcDDmonCCYY_HH:MM:SS is interpreted as
+------------------------------------------------------------------+
| % t c DDmonCCYY_HH:MM:SS |
| | | | | |
| all formats it's a variable formatting codes |
| start with % time format coded in specify how to |
| milliseconds display value |
+------------------------------------------------------------------+
The formatting codes are
Code Meaning Output
-----------------------------------------------------------------
CC century-1 01 - 99
cc century-1 1 - 99
YY 2-digit year 00 - 99
yy 2-digit year 0 - 99
JJJ day within year 001 - 366
jjj day within year 1 - 366
Mon month Jan, Feb, ..., Dec
Month month January, February, ..., December
mon month jan, feb, ..., dec
month month january, february, ..., december
NN month 01 - 12
nn month 1 - 12
DD day within month 01 - 31
dd day within month 1 - 31
DAYNAME day of week Sunday, Monday, ... (aligned)
Dayname day of week Sunday, Monday, ... (unaligned)
Day day of week Sun, Mon, ...
Da day of week Su, Mo, ...
day day of week sun, mon, ...
da day of week su, mo, ...
h half 1 - 2
q quarter 1 - 4
WW week 01 - 52
ww week 1 - 52
HH hour 00 - 23
Hh hour 00 - 12
hH hour 0 - 23
hh hour 0 - 12
MM minute 00 - 59
mm minute 0 - 59
SS second 00 - 60 (sic, due to leap seconds)
ss second 0 - 60 (sic, due to leap seconds)
.s tenths .0 - .9
.ss hundredths .00 - .99
.sss thousandths .000 - .999
am show am or pm am or pm
a.m. show a.m. or p.m. a.m. or p.m.
AM show AM or PM AM or PM
A.M. show A.M. or P.M. A.M. or P.M.
. display period .
, display comma ,
: display colon :
- display hyphen -
_ display space
/ display slash /
\ display backslash \
!c display character c
+ separator (see note)
-----------------------------------------------------------------
Note: + displays nothing; it may be used to separate one code
from the next to make the format more readable. + is never
necessary. For instance, %tchh:MM+am and %tchh:MMam have the
same meaning, as does %tc+hh+:+MM+am.
Thus, if you had a %td variable and wanted to display the dates as,
for example, January 9, 2002, you could specify the format
%tdMonth_dd,_CCYY.
If you had a %tc variable and wanted to display the time as
Fri Aug 18 12:01:35 CDT 2006
you could specify %tcDay_Mon_DD_HH:MM:SS_!C!D!T_CCYY.
The maximum length of a format specifier is 48 characters; the
example shown above is 34 characters.
Description
Complete documentation of Stata's treatment of date and time values is
provided. Every feature and function is documented here, either in
Syntax above or in Remarks below.
Remarks
Remarks are presented under the following headings:
Experimenting with the date and time functions
String-to-numeric translation functions
The clock() function
How clock() interprets the mask
Working with two-digit years
Working with incomplete dates and times
The Clock() function
The date() function
Translating run-together dates, such as 20060125
The other translation functions
Valid times
When leap seconds occurred
Truncated times
Advice on using %tc and %tC
Summary
Explanation
Experimenting with the date and time functions
The best way to become familiar with Stata's date and time functions is
to experiment with the display command.
. display date("5-12-1998", "MDY")
14011
. display %td date("5-12-1998", "MDY")
12may1998
. display clock("5-12-1998 11:15", "MDY hm")
1.211e+12
. display %20.0gc clock("5-12-1998 11:15", "MDY hm")
1,210,590,900,000
. display %tc clock("5-12-1998 11:15", "MDY hm")
12may1998 11:15:00
Remember, when you work with display, you can specify a format in front
of the expression to specify how the result is to be formatted.
String-to-numeric translation functions
The string-to-numeric date and time translation functions are
Format | String-to-numeric conversion function
-------+-----------------------------------------
%tc | clock(string, mask [, topyear])
%tC | Clock(string, mask [, topyear])
|
%td | date(string, mask [, topyear])
|
%tw | weekly(string, mask [, topyear])
%tm | monthly(string, mask [, topyear])
%tq | quarterly(string, mask [, topyear])
%th | halfyearly(string, mask [, topyear])
%ty | yearly(string, mask [, topyear])
-------------------------------------------------
string is the value to be translated.
mask specifies the order of the components.
topyear is described in Working with two-digit years below.
These functions are typically used after reading date, time, or date and
time data. The data contain values such as "08/12/06", "12-8-2006", "12
Aug 06", "12aug2006 14:23", and "12 aug06 2:23 pm". You read the data
into a string variable and then use one of the translation functions to
translate the string into a %t variable.
The translation functions are used in expressions, such as
. generate double timeadmitted = clock(timeadmitstr, "DMYhms")
. format timeadmitted %tc
. generate datehired = date(datehiredstr, "MDY")
. format datehired %td
All functions require two arguments, the string to be translated and a
second string specifying the order in which the date and time components
occur.
The most useful of these functions are clock(), Clock(), and date(). The
other functions are rarely used.
The clock() function
clock() returns a %tc value. The syntax of clock() is
clock(string, mask [, topyear])
Ignore optional argument topyear; we will discuss that below. Second
argument mask is a string specifying the order of the components in
string and consists of the following codes:
Code | Meaning
------+---------------------------------------
M | month
D | day within month
Y | 4-digit year
19Y | 2-digit year to be interpreted as 19xx
20Y | 2-digit year to be interpreted as 20xx
|
h | hour of day
m | minutes within hour
s | seconds within minute
|
# | ignore one element
----------------------------------------------
Examples of date strings and the mask required to translate them include
String to translate Corresponding mask
----------------------------------------------------
01dec2006 14:22 "DMYhm"
01-12-2006 14.22 "DMYhm"
1dec2006 14:22 "DMYhm"
1-12-2006 14:22 "DMYhm"
01dec06 14:22 "DM20Yhm"
01-12-06 14.22 "DM20Yhm"
December 1, 2006 14:22 "MDYhm"
2006 Dec 01 14:22 "YMDhm"
2006-01-12 14:22 "YMDhm"
2006-01-12 14:22:43 "YMDhms"
2006-01-12 14:22:43.2 "YMDhms"
2006-01-12 14:22:43.21 "YMDhms"
2006-01-12 14:22:43.213 "YMDhms"
2006-01-12 2:22:43.213 pm "YMDhms"
2006-01-12 2:22:43.213 pm. "YMDhms"
2006-01-12 2:22:43.213 p.m. "YMDhms"
2006-01-12 2:22:43.213 P.M. "YMDhms"
20060112 1422 "YMDhm"
14:22 "hm" (see note)
2006-12-01 "YMD" (see note)
Wed Dec 01 14:22:43 CST 2006 "#MDhms#Y"
----------------------------------------------------
Note: A subset of components may be specified.
clock("14:22", "hm") produces 01jan1960 14:22:00.
clock("2006-12-01", "YMD") produces 01dec2006 00:00:00.
Also there is nothing special included in mask
to process a.m. and p.m. markers; when you include
code h, clock() automatically watches for the
meridian markers.
mask may include spaces so that it is more readable; they have no
meaning. Thus we can code
. generate double admit = clock(admitstr, "#MDhms#Y")
or code
. generate double admit = clock(admitstr, "# MD hms # Y")
and which we code makes no difference.
How clock() interprets the mask
To specify the appropriate mask, it helps to understand the rules that
clock() applies. They are
1. For each string to be translated, remove all punctuation except
for the period separating seconds from tenths, hundredths, and
thousandths of seconds. Replace the punctuation with a space.
2. Insert a space in the string everywhere that a letter is next to
a number or vice versa.
3. Interpret the resulting elements according to mask.
For instance, consider the string
01dec2006 14:22
Under rule 1, the string becomes
01dec2006 14 22
Under rule 2, the string becomes
01 dec 2006 14 22
Now clock() applies rule 3. If the mask is "DMYhm", then clock()
interprets "01" as the day, "dec" as the month, and so on.
Or consider the string
Wed Dec 01 14:22:43 CST 2006
Under rule 1, the string becomes
Wed Dec 01 14 22 43 CST 2006
Applying rule 2 does not change the string. Now clock() applies rule 3.
If the mask is "#MDhms#Y", clock() skips "Wed", interprets "Dec" as the
month, and so on.
The # code serves a second purpose. If it appears at the end of the
mask, it specifies that the rest of string is to be ignored. Consider
translating
Wed Dec 01 14 22 43 CST 2006 patient 42
The mask code that previously worked when "patient 42" was not part of
the string, "#MDhms#Y", will result in a missing value. clock() is
careful in the translation and, if the whole string is not used, returns
missing. If you end the mask in #, however, clock() ignores the rest of
the string. Changing the mask from "#MDhms#Y" to "#MDhms#Y#" will
produce the desired result.
Working with two-digit years
Consider translating the string 01-12-06 14:22, which is to be
interpreted as 01dec2006 14:22:00. clock() provides two ways of doing
this.
The first is to specify the assumed prefix in the mask. 01-12-06 14:22
can be read by specifying mask "DM20Yhm". If we instead wanted to
interpret the year as 1906, we would specify mask "DM19Yhm". We could
even interpret the year as 1806 by specifying "DM18Yhm".
But what if our data include 01-12-06 14:22 and include 06-15-98 11:01?
We want to interpret the first as being in 2006 and the second as being
in 1998. That is the purpose of optional argument topyear:
clock(string, mask [, topyear])
When you specify topyear, you are stating that when years in string are
two digits, the full year is to be obtained by finding the largest year
not exceeding topyear. Thus you could code,
. generate double timestamp = clock(timestr, "DMYhm", 2020)
Two-digit year 06 would be interpreted as 2006 because 2006 does not
exceed 2020. Two-digit 98 would be interpreted as 1998 because 2098 does
exceed 2020; 1998 does not.
Working with incomplete dates and times
The clock() function does not require that every component of the date
and time be specified.
Translating 2006-12-01 with mask "YMD" results in 01dec2006 00:00:00.
Translating 14:22 with mask "hm" results in 01jan1960 14:22:00.
Translating 11-2006 with mask "MY" results in 01nov2006 00:00:00.
The default for a component, if not specified in the mask, is
Code | Default if not specified
------+-------------------------
M | 01
D | 01
Y | 1960
|
h | 00
m | 00
s | 00
--------------------------------
This feature is useful. You may have data recording "14:22", meaning a
duration of 14 hours and 22 minutes, or the time 14:22 each day. See
Obtaining and working with durations under Syntax above.
The Clock() function
The syntax of the Clock() function is
Clock(string, mask [, topyear])
The Clock() function is identical to clock() except that, rather than
returning a %tc value, it returns %tC.
Note: Clock() is almost identical to Cofc(clock()). The difference is
that Clock() understands leap seconds, such as 30jun1997 23:59:60.
The date() function
The syntax of the date() function is
date(string, mask [, topyear])
The date() function is identical to clock() except that it returns a %td
value rather than a %tc value. The date() function is the same as
dofc(clock()).
Historical note: Stata 10's date() function is much improved over that
of previous versions, and the mask is specified a little differently.
In previous versions, the codes for year, month, and date were y, m,
and d rather than Y, M, and D. Under version control, the old codes
are allowed and, in fact, the original date() function is used.
The big advantage of Stata 10's date() is that it will translate
run-together dates such as 20061201 (no special action by you
required) and translate more complicated date strings such as Wed Dec
01 14:22:43 CST 2006 (special action required in how mask is
specified, something that the old date() would not have understood).
Translating run-together dates, such as 20060125
The clock(), Clock(), and date() functions will translate dates and times
that are run together, such as 20060125, 060125, and 20060125110215
(which is 25jan2006 11:02:15). There is nothing special that you have to
do:
. display %d date("20060125", "YMD")
25jan2006
. display %td date("060125", "20YMD")
25jan2006
. display %tc clock("20060125110215", "YMDhms")
25jan2006 11:02:15
In a data context, you could type
. gen startdate = date(startdatestr, "YMD")
. gen double starttime = clock(starttimestr, "YMDhms")
Remember to read the original data into a string. If you read the data
as numeric, the best advice is to read the data again. Numbers such as
20060125 and 20060125110215 will be rounded unless they are stored as
doubles.
If you did read them into a double, or you have verified that rounding
did not occur, you can convert the variable from numeric to string. The
numeric-to-string conversion function is string(), which comes in one-
and two-argument forms. You will need the two-argument form:
. gen str startdatestr = string(startdatedouble, "%10.0g")
. gen str starttimestr = string(starttimedouble, "%16.0g")
If you omitted the format, string() would produce 2.01e+07 for 20060125
and 2.01e+13 for 20060125110215. The format we used had a width 2 larger
than the length of the integer number, although using a too-wide format
would not hurt.
The other translation functions
The other translation functions are
Format | String-to-numeric conversion function
-------+-----------------------------------------
%tw | weekly(string, mask [, topyear])
%tm | monthly(string, mask [, topyear])
%tq | quarterly(string, mask [, topyear])
%th | halfyearly(string, mask [, topyear])
-------------------------------------------------
string is the value to be translated.
mask specifies the order of the components.
topyear is described in Working with two-digit years above.
These functions are rarely used because data seldom arrive in these
formats.
All the functions translate a pair of numbers: weekly() translates a year
and a week number (1-52), monthly() translates a year and a month number
(1-12), quarterly() translates a year and a quarter number (1-4), and
halfyearly() translates a year and a half number (1-2).
The masks allowed are far more limited than for clock(), Clock(), and
date():
Code | Meaning
------+---------------------------------------
Y | 4-digit year
19Y | 2-digit year to be interpreted as 19xx
20Y | 2-digit year to be interpreted as 20xx
|
W | week number (weekly() only)
M | month number (monthly() only)
Q | quarter number (quarterly() only)
H | half number (halfyearly() only)
----------------------------------------------
The pair of numbers to be translated must be
separated by space or punctuation. No extra
characters are allowed.
Historical note: Before Stata 10, the mask codes were lowercase letters.
Under version control, lowercase letters are still allowed.
Valid times
27:62:90 is an invalid time. If you try to convert 27:62:90 to a %tc or
%tC value, you will get a missing value or an error message.
24:00:00 is also invalid. Correct is 00:00:00 of the next day.
In hh:mm:ss, the requirements are 0 <= hh < 24, 0 <= mm < 60, and 0 <= ss
< 60, although sometimes 60 is allowed.
31dec2005 23:59:60 is an invalid %tc time but a valid %tC one. 31dec2005
23:59:60 was an inserted leap second.
30dec2005 23:59:60 is an invalid time in both %tc and %tC formats.
30dec2005 23:59:60 was not an inserted leap second. Correct is 31dec2005
00:00:00.
When leap seconds occurred
Stata system file leapseconds.maint lists the dates on which leap seconds
occurred. The file is updated periodically (see [R] update; the file is
updated when you update ado-files) and Stata's %tC functions access the
file to know when leap seconds occurred.
You can access it, too. To view the file, type
. viewsource leapseconds.maint
Truncated times
Consider the time 11:32:59.999. Other, less precise, ways of writing
that time are
11:32:59.99
11:32:59.9
11:32:59
11:32
That is, when you suppress the display of more detailed components of the
time, the parts that are displayed are not rounded. Stata displays time
like a digital watch; the time is 11:32 right up until the instant that
it is 11:33.
Advice on using %tc and %tC
Summary
Stata provides two time formats:
1. %tC, also known as UTC, which accounts for leap seconds, and
2. %tc, which ignores them (it assumes 86,400 seconds/day).
Systems vary in how they treat time variables. SAS ignores leap seconds.
Oracle includes them. Stata handles either. Our advice:
o If you obtain data from a system that accounts for leap seconds,
import using Stata's %tC.
a. If you later need to export data to a system that does not
account for leap seconds, use Stata's cofC() function to
translate time values before exporting.
b. If you intend to tsset the time variable and the analysis
will be at the second level or finer, just tsset the %tC
variable, specifying the appropriate delta() if necessary,
e.g., delta(1000) for seconds.
c. If you intend to tsset the time variable and the analysis
will be at coarser than the second level (minute, hour,
etc.), create a %tc variable from the %tC variable (generate
double tctime = cofC(tCtime)) and tsset that, specifying the
appropriate delta() if necessary. You must do that because,
in a %tC variable, there are not necessarily 60 seconds in a
minute; some minutes have 61 seconds.
o If you obtain data from a system that ignores leap seconds, use
Stata's %tc.
a. If you later need to export data to a system that does
account for leap seconds, use Stata's Cofc() function to
translate time values.
b. If you intend to tsset the time variable, just tsset it,
specifying the appropriate delta().
Some users prefer to always use Stata's %tc because %tc values are a
little easier to work with. You can do that if
o you do not mind having up to 1 second of error and
o you do not import or export numerical values (clock ticks) from
other systems that are using leap seconds, because then there
could be nearly 30 seconds of error.
There are two things to remember if you use %tC variables:
1. The number of seconds between two dates is a function of when the
dates occurred. Five days from one date is not simply a matter
of adding 5*24*60*60*1,000 ms. You might need to add another
1,000 ms. Three hundred and sixty-five days from now might
require adding 1,000 or 2,000 ms. The longer the span, the more
you might have to add. (The best way to add durations to %tC
variables is to extract the components, add to them, and then
reconstruct from the numerical components.)
2. You cannot accurately predict date/times into the future. We do
not know what the %tC value will be of 25dec2026 00:00:00
because, along the way, the authorities may (and probably will)
announce leap seconds.
Explanation
Stata's %tc encoding assumes that there are 24*60*60*1,000 ms per day,
just as an atomic clock, counting oscillations between the nucleus of an
atom and its electrons, would define it.
Since 1972, leap seconds have been added once or twice a year to keep
time measured in synchronization with the earth's rotation. Unlike leap
years, however, there is no formula to predict when leap seconds will
occur. The earth is on average slowing down, but there is a relatively
large random component, and so leap seconds are determined by fiat and
announced 6 months before they are inserted. Leap seconds are added, if
necessary, on the end of the day on June 30 and December 31 and are
designated as 23:59:60.
You may have heard various terms such as GMT and UTC.
GMT is the old Greenwich Mean Time and is based on astronomical
observation.
UTC stands for coordinated universal time and is measured by atomic
clocks, occasionally corrected for leap seconds.
UT1 is the mean solar time, with which UTC is kept in sync by the
occasional addition of a leap second.
TAI is the atomic time on which UTC is based. TAI was set to GMT plus 10
seconds in 1958 and has been running since then.
UNK is our term for the time standard most people use. UNK stands for
unknown, or unknowing. UNK is based on a recent time observation,
probably UTC, and then most people just assume that there are 86,400
seconds per day after that.
The UNK standard is usually adequate, and you will want to use %tc rather
than the leap second-adjusted %tC encoding. If you are using
computer-timestamped data, however, you may need to find out whether the
timestamping system used leap-second adjustment. Problems can arise even
if you do not care about losing or gaining a second here and there.
For instance, you may import timestamp values from other systems as
integers, recorded in the number of milliseconds, or export them. You
may do this, but as of 18aug2006, if you choose the wrong encoding scheme
(choose %tc when you should choose %tC, or vice versa), your recent times
will be off by 23 seconds.
To avoid such problems, you may decide to import and export data by using
printable forms, such as "Fri Aug 18 14:05:36 CDT 2006". This method has
advantages, but for %tC encoding, times such as 23:59:60 are possible.
Some systems will refuse to decode such times.
Stata refuses to decode 23:59:60 in the %tc encoding (function clock())
and accepts it with %tC (function Clock()). (When the %tC function
Clock() sees a time with a 60th second, the function verifies that the
time is one of the official leap seconds.) Thus, when translating from
printable forms, you can assume %tc and check for missing values. If
there are none, then you can use %tc. You will never be off by more than
1 second. If there are leap seconds in your data, use Clock() to
translate them and then, if you still want to work in %tc units, use
function cofC() to translate %tC values into %tc. Again you will have no
more than 1 second of inaccuracy.
If precision matters, the best way to process %tC data is simply to treat
them that way. The inconvenience is that you cannot assume that there
are 86,400 seconds per day. To obtain the duration between dates, you
must subtract the two time values involved. The other difficulty has to
do with dealing with dates in the future. Under the %tC encoding, there
is no set value for any date more than 6 months in the future.
Also see
Manual: [D] dates and times
Help: [D] format