Stata 11 help for mdy

help dates and times -------------------------------------------------------------------------------

Title

[D] dates and times -- Date and time (%t) values and variables

Syntax

Syntax is presented under the following headings:

How Stata records dates and times Inputting date and time data Recommended storage types for %t variables Typing dates and times Constructing date and time values from numerical components Converting date and time values Extracting date and time components Obtaining and working with durations Formatting date and time values

Click here to skip to the Description and Remarks.

How Stata records dates and times

Dates and times are called %t values. %t values are numerical and integral. The integral value records the number of time units that have passed from an agreed-upon base, which for Stata is 1960.

Coding and interpretation of date and time (%t) values are as follows:

+---------------------------------------------------------------------+ | | | ----- Numerical value & interpretation ------ | | Format | Meaning | Value = -1 | Value = 0 | Value = 1 | |--------+------------+---------------+---------------+---------------| | %tc | clock | 31dec1959 | 01jan1960 | 01jan1960 | | | | 23:59:59.999 | 00:00:00.000 | 00:00:00.001 | | | | | | | | %td | days | 31dec1959 | 01jan1960 | 02jan1960 | | | | | | | | %tw | weeks | 1959w52 | 1960w1 | 1960w2 | | | | | | | | %tm | months | 1959m12 | 1960m1 | 1960m2 | | | | | | | | %tq | quarters | 1959q4 | 1960q1 | 1960q2 | | | | | | | | %th | half-years | 1959h2 | 1960h1 | 1960h2 | | | | | | | | %tg | generic | -1 | 0 | 1 | +---------------------------------------------------------------------+ Explanation: The middle, bolded column shows the base value. For a %td value, 0 means 01jan1960. The table also shows that -1 means 31dec1959 and 1 means 02jan1960. A %td value records the number of days from 01jan1960; a %tc value records the number of milliseconds from the start of 01jan1960; a %tw value records the number of weeks from the first week of 1960; and so on.

That is,

o For a %tc value, a 1-unit change represents 1 ms.

Integer 394,839,482,000 represents 05jul1972 21:38:02.000 because that date occurred 394,839,482,000 ms after 01jan1960 00:00:00.000.

Integer -394,839,482,000 represents 28jun1947 02:21:58.000 because that date occurred 394,839,482,000 ms before 01jan1960 00:00:00.000.

o For a %td value, a 1-unit change represents 1 day.

Integer 4,569 represents 05jul1972 because that date occurred 4,569 days after 01jan1960.

Integer -4,569 represents 29jun1947 because that date occurred 4,569 days before 01jan1960.

o For a %tw value, a 1-unit change represents 1 week.

Integer 650 represents 1972w27 because that date occurred 650 weeks after 1960w1.

Integer -650 represents 1947w27 because that date occurred 650 weeks before 1960w1.

o For a %tm value, a 1-unit change represents 1 month.

Integer 150 represents 1972m7 because that date occurred 150 calendar months after 1960m1.

Integer -150 represents 1947m7 because that date occurred 150 calendar months before 1960m1.

o For a %tq value, a 1-unit change represents one quarter (3 calendar months).

Integer 50 represents 1972q3 because that date occurred 50 quarters after 1960q1.

Integer -50 represents 1947q3 because that date occurred 50 quarters before 1960q1.

o For a %th value, a 1-unit change represents one half-year, or 6 months.

Integer 25 represents 1972h2 because that date occurred 25 half-years after 1960h1.

Integer -25 represents 1947h2 because that date occurred 25 half-years before 1960h1.

o For a %tg value, a 1-unit change represents whatever you wish.

Integer 100 might represent 100 workdays, or 100 lunar months, or anything else, after some agreed-upon event, such as 01jan1960, or the date you were born, or anything else.

Negative values would represent times before the event.

In addition to the above, there is %ty:

+---------------------------------------------------------------------+ | | | ------ Numerical value & interpretation ------ | | Format | Meaning | 1959 | 1960 | 1961 | |--------+-----------+---------------+---------------+----------------| | %ty | year | 1959 | 1960 | 1961 | +---------------------------------------------------------------------+

A %ty value is like the other %t values except that, rather than the base being 1960, the base is 0 AD. (Years 0100 through 9999 are valid.)

In addition to the above, there is %tC:

+---------------------------------------------------------------------+ | | | ------ Numerical value & interpretation ------ | | Format | Meaning | -1 | 0 | 1 | |--------+-----------+---------------+---------------+----------------| | %tC | clock | 31dec1959 | 01jan1960 | 01jan1960 | | | | 23:59:59.999 | 00:00:00.000 | 00:00:00.001 | | | | | | | +---------------------------------------------------------------------+

%tC is similar to %tc, except that %tC accounts for leap seconds:

Remember that %tc integer 394,839,482,000 represents 05jul1972 21:38:02.000.

That integer in %tC represents 05jul1972 21:38:01.000. For those who wish their clock based on astronomical observation, 1 leap second was inserted. (The first leap second was on 30jun1972, the second on 31dec1972, and others have been inserted since then.) See Advice on using %tc and %tC under Remarks below.

Jargon: A %td value is sometimes called an elapsed date.

Historical note: A %td value is sometimes referred to as a %d value. The t is omitted because, in Stata's history, %d values predated the other %t values. Dropping the t is still allowed but is now considered an anachronism.

Inputting date and time data

Date and time variables are best read as strings. Use one of the string-to-numeric conversion functions to convert the string representation to the appropriate %t value:

Format | String-to-numeric conversion function -------+----------------------------------------- %tc | clock(string, mask) %tC | Clock(string, mask) | %td | date(string, mask) | %tw | weekly(string, mask) %tm | monthly(string, mask) %tq | quarterly(string, mask) %th | halfyearly(string, mask) %ty | yearly(string, mask) | %tg | no function necessary; read as numeric -------------------------------------------------

In the above functions, string is the variable or value containing the string representation to be converted and mask specifies the order in which the components occur:

o For %td function date(), string might be "August 21, 2005" or "8-21-2005" and mask might be "MDY", meaning that the elements occur in the order month, day, and year.

o For %tc function clock(), string might be "21aug2005 15:21:22" and mask might be "DMYhms", meaning that the elements occur in the order day, month, year, hours, minutes, and seconds.

Thus one might code

. generate datehired = date(datehiredstr, "MDY")

. generate double timeadmitted = clock(timeadmitstr, "DMYhms")

See String-to-numeric translation functions under Remarks for details.

Recommended storage types for %t variables

In the example above, we stored %tc variable timeadmitted as a double. Doing so is important if precision is to be maintained.

The recommended storage types for %t variables are

Format | Recommended storage type -------+-------------------------- %tc | double %tC | double | %td | float or long | %tw | float or int %tm | float or int %tq | float or int %th | float or int %ty | float or int %tg | float or int ----------------------------------

Storing a %tc (%tC) variable as a double is important if precision is to be maintained. %tc variables are integers, but being the number of milliseconds from the start of 1960, they are large integers.

o What happens if you store a %tc value as a float? The largest integer that can be stored precisely in a float is 16,777,216, corresponding to 01jan1960 04:39:37.216. Times after that will be subject to rounding; the rounding as of recent times can be as much as 2 minutes, 11 seconds.

o What happens if you store a %tc value as a long? The largest integer that can be stored in a long is 2,147,483,620, corresponding to 25jan1960 20:31:23.620. Times after that cannot be stored in a long.

o What happens if you store a %tc value as a double? The largest integer that can be stored precisely in a double is 9,007,199,254,740,992, corresponding to a date in year 285,422,880. Stata cuts off dates at year 9999, but for other reasons.

(In the above, we use an idiosyncratic definition of "precisely": positive value x is stored precisely if x MINUS 1 is not equal to x, where MINUS is the computer's operation of subtraction. For float and double, there are larger values that are stored exactly, but not precisely. For example, both float and double can exactly store the integer 2^100, a value approximately equal to 1.3e+30, but 2^100 MINUS 1 is still 2^100 because of loss of precision.)

+-----------------------------------------------+ | DO NOT FORGET | | | | %tc and %tC values MUST BE stored as doubles. | | Doing so is your responsibility, not Stata's. | +-----------------------------------------------+

Typing dates and times

Remember, date and time values are just integers, so in an expression, you could type the appropriate integer:

. gen before = cond(hiredon < 16237, 1, 0) if hiredon < .

. drop if admittedon < 1402920000000

Easier to type is

. gen before = cond(hiredon < td(15jun2004), 1, 0) if hiredon < .

. drop if admittedon < tc(15jun2004 12:00:00)

td() and tc() are called pseudofunctions because they translate what you type into their integer equivalents. Pseudofunctions require only that you specify the date/time components in the expected order, so rather than 15jun2004 above, we could have specified 15 June 2004, 15-6-2004, or 15/6/2004.

The date and time pseudofunctions and their expected component order are

Format | Pseudofunction -------+-------------------------------------------------- %tc | tc([day-month-year] hh:mm[:ss[.sss]]) %tC | tC([day-month-year] hh:mm[:ss[.sss]]) | %td | td(day-month-year) | %tw | tw(year-week) %tm | tm(year-month) %tq | tq(year-quarter) %th | th(year-half) %ty | none necessary; just type year %tg | none necessary ----------------------------------------------------------

The day-month-year in tc() and tC() are optional. If you omit them, 01jan1960 is assumed. Doing so produces time as an offset, which can be useful in, for example,

. gen six_hrs_later = eventtime + tc(6:00)

Also see Extracting date and time components below.

Historical note: Pseudofunctions td(), tw(), tm(), tq(), and th() used to be called d(), w(), m(), q(), and h(). Those names still work but are considered anachronisms.

Constructing date and time values from numerical components

If you had numeric variables M, D, and Y containing month number, day of month, and year (in the first observation, the variables might contain 12, 15, and 2006), you could code

. generate mydate = mdy(M, D, Y)

to obtain a new %td variable containing the date (which would be 15dec2006 in the first observation).

The date-from-numerical-components functions are

Format | Function -------+------------------------------------------ %tc | mdyhms(M, D, Y, h, m, s) %tc | dhms(td, h, m, s) %tc | hms(h, m, s) | %tC | Cmdyhms(M, D, Y, h, m, s) %tC | Cdhms(td, h, m, s) %tC | Chms(h, m, s) | %td | mdy(M, D, Y) | %tw | yw(Y, W) %tm | ym(Y, M) %tq | yq(Y, Q) %th | yh(Y, H) %ty | Y --------------------------------------------------

where

td is a %td value,

M, D, and Y are month, day, and year values, 1 <= M <= 12 1 <= D <= 31 0100 <= Y <= 9999

h, m, and s are hour, minute, and second values, 0 <= h <= 23 0 <= m <= 59 0.000 <= s <= 59.999 (see note below)

W is a week number, 1 <= W <= 52

Q is a quarter number, 1 <= Q <= 4

H is a half number, 1 <= H <= 2

Note concerning s: The Cmdyhms() and Cdhms() functions allow 0.000 <= s <= 60.999 when the 60th second is a leap second. For instance, according to the authorities, 31dec1972 23:59:60 is an official leap second but 31dec1971 23:59:60 is not. Cmdyhms(12,31,1971,23,59,60) therefore evaluates to missing (.) whereas Cmdyhms(12,31,1972,23,59,60) evaluates to 410,313,601,000, a nonmissing value. (The expanded range of s does not apply to Chms() because it is a pure time based on 01jan1960 and there were no leap seconds on that date. The hms() and Chms() functions are, in fact, identical.)

The mdyhms() and dhms() functions are related by

mdyhms(M, D, Y, h, m, s) = dhms(mdy(M,D,Y), h, m, s)

and, similarly,

Cmdyhms(M, D, Y, h, m, s) = Cdhms(mdy(M,D,Y), h, m, s)

With mdyhms(), you have six variables, such as M=7, D=5, Y=1972, h=21, m=38, and s=2, and mdyhms() returns 05jul1972 21:38:02. With dhms() you have four variables, the first specifying the %td value of 05jul1972, and h, m, and s being the same, and dhms() returns the date + time, 05jul1972 21:38:02.

Converting date and time values

One type of %t value can be converted into another. The functions are

To... From | %tc %tC %td %tw %tm %tq -----+---------------------------------------------------- %tc | Cofc() dofc() %tC | cofC() dofC() | %td | cofd() Cofd() wofd() mofd() qofd() | %tw | dofw() %tm | dofm() %tq | dofq() %th | dofh() %ty | dofy() ----------------------------------------------------------

To... From | %th %ty -----+----------------- %tc | %tC | | %td | hofd() yofd() | %tw | %tm | %tq | %th | %ty | -----------------------

For instance, to convert a %td to a %tc value,

. generate double datetimevalue = cofd(datevalue)

%td is the mother of all date and time values, and to convert a %tq value to a %tc value, you must first convert to a %td value:

. generate double datetimevalue = cofd(dofq(quartervalue))

Extracting date and time components

Let d be a %td variable or value. The following functions will extract components of d:

------------------------------------------------------------- Result if d = td(05jul1972) Function Returns (i.e., d = 4,569) ------------------------------------------------------------- year(d) calendar year 1972 month(d) calendar month 7 day(d) day within month 5

doy(d) day of year 187

halfyear(d) half of year 2 quarter(d) quarter 3 week(d) week within year 27

dow(d) day of week (0 = Sunday) 3 (means Wednesday) -------------------------------------------------------------

Remember, any %t value can be converted to a %td value by using the appropriate conversion function; see Converting date and time values above. If the date_time_admitted variable is %tc and you want to obtain the day of week,

. gen day = dow(dofc(date_time_admitted))

Let t be a %tc variable. The following functions will extract components of t:

---------------------------------------------------------------- Result if t = tc(05jul1972-21:38:02) Function Returns (i.e., t = 394,839,482,000) ---------------------------------------------------------------- hh(t) time of day, hours 21 mm(t) time of day, minutes 38 ss(t) time of day, seconds 2.000 ---------------------------------------------------------------- Other components can be extracted by calculating dofc(t) and then extracting components from the %td value.

Let T be a %tC variable. The following functions will extract components of T:

---------------------------------------------------------------- Result if T = tC(05jul1972-21:38:01) Function Returns (i.e., T = 394,839,482,000) ---------------------------------------------------------------- hhC(T) time of day, hours 21 mmC(T) time of day, minutes 38 ssC(T) time of day, seconds 1.000 ---------------------------------------------------------------- By convention, leap seconds came after 23:59:59 and are labeled 23:59:60. Thus ssC(T) can return 60. Other components can be extracted by calculating dofC(T) and then extracting components from the %td value.

Obtaining and working with durations

Remember that %t variables are simply durations from 1960:

Format | Units -------+------------- %tC | milliseconds %tc | milliseconds | %td | days | %tw | weeks %tm | months %tq | quarters %th | half-years ---------------------

Thus, to obtain the duration between %t variables, subtract them:

. gen days_employed = curdate - hiredate

. gen qtrs_to_15jan = curqtr - qofd(td(15jan2005))

To add a duration to a date, add the two values:

. gen lastdate = hiredate + days_employed . format lastdate %td

. gen qtr_of_merger = curqtr + quarters_to_merger . format qtr_of_merger %tq

When creating new date and time variables, remember to format them so that they will be readable should you print them.

The above applies equally to %tc and %tC variables:

. gen double millisecs_employed = lasttime - hiretime

and

. gen double lasttime = hiretime + millisecs_employed . format lasttime %tc

Note our use of double. Times are recorded in milliseconds and must be stored as doubles if precision is to be maintained.

There are 1,000 ms in a second, 60*1,000 in a minute, and 60*60*1,000 in an hour. It is easy to mistype these constants when converting to more readable units, and therefore the following functions are provided:

Function | Purpose ---------------+---------------------------------- hours(ms) | convert milliseconds to hours | returns ms/(60*60*1000) | minutes(ms) | convert milliseconds to minutes | returns ms/(60*1000) | seconds(ms) | convert milliseconds to seconds | returns ms/1000 | msofhours(h) | convert hours to milliseconds | returns h*60*60*1000 | msofminutes(m) | convert minutes to milliseconds | returns m*60*1000 | msofseconds(s) | convert seconds to milliseconds | returns s*1000 --------------------------------------------------

Thus you can code

. gen double days_employed = 24*hours(lasttime-hiretime)

and

. gen double lasttime = hiretime + msofhours(24*days_employed)

If precision is to be preserved, the use of these functions does not alleviate the necessity of using doubles.

days_employed in the above will include fraction of a day. If a rounded integer result is desired, then round explicitly:

. gen approx_days_employed = round(24*hours(lasttime-hiretime))

Formatting date and time values

A variable's values are formatted to indicate 1) the units used and 2) how the variable is to be displayed:

. generate mydate = date(datestr, "DMY")

. list mydate in 1 +--------+ | mydate | |--------| 1. | 17096 | +--------+

. format mydate %td

. list mydate in 1 +-----------+ | mydate | |-----------| 1. | 22oct2006 | +-----------+

. generate double mytime = clock(timestr, "DMY hm")

. list mytime in 1 +-----------+ | mytime | |-----------| 1. | 1.477e+12 | +-----------+

. format mytime %tc

. list mytime in 1 +--------------------+ | mytime | |--------------------| 1. | 22oct2006 13:02:00 | +--------------------+

The %t formats result in the following output:

Format | Example of output -------+---------------------------- %tC | 05jul1972 21:38:01 %tc | 05jul1972 21:38:02 | %td | 05jul1972 | %tw | 1972w27 %tm | 1972m7 %tq | 1972q3 %th | 1972h2 %ty | 1972 %tg | (actual integer shown) ------------------------------------ Formats %tC and %tc do not show the milliseconds by default. You can specify how dates and times are to be formatted. Rather than 05jul1972, you could have July 5, 1972, or rather than 05jul1972 21:38:02, you could have 7-5-72 9:38 p.m. This reformatting is done by adding codes to the end of %tC, %tc, %td, etc. In fact, the default %tC, %tc, %td, ..., formats actually mean

Format | Implied (fully specified) format -------+--------------------------------- %tC | %tCDDmonCCYY_HH:MM:SS %tc | %tcDDmonCCYY_HH:MM:SS | %td | %tdDDmonCCYY | %tw | %twCCYY!www %tm | %tmCCYY!mnn %th | %thCCYY!hh %ty | %tyCCYY -----------------------------------------

Typing

. format mytimevar %tc

has the same effect as typing

. format mytimevar %tcDDmonCCYY_HH:MM:SS

Format %tcDDmonCCYY_HH:MM:SS is interpreted as

+------------------------------------------------------------------+ | % t c DDmonCCYY_HH:MM:SS | | | | | | | | all formats it's a variable formatting codes | | start with % time format coded in specify how to | | milliseconds display value | +------------------------------------------------------------------+

The formatting codes are

Code Meaning Output ----------------------------------------------------------------- CC century-1 01 - 99 cc century-1 1 - 99 YY 2-digit year 00 - 99 yy 2-digit year 0 - 99

JJJ day within year 001 - 366 jjj day within year 1 - 366

Mon month Jan, Feb, ..., Dec Month month January, February, ..., December mon month jan, feb, ..., dec month month january, february, ..., december NN month 01 - 12 nn month 1 - 12

DD day within month 01 - 31 dd day within month 1 - 31

DAYNAME day of week Sunday, Monday, ... (aligned) Dayname day of week Sunday, Monday, ... (unaligned) Day day of week Sun, Mon, ... Da day of week Su, Mo, ... day day of week sun, mon, ... da day of week su, mo, ...

h half 1 - 2 q quarter 1 - 4 WW week 01 - 52 ww week 1 - 52

HH hour 00 - 23 Hh hour 00 - 12 hH hour 0 - 23 hh hour 0 - 12

MM minute 00 - 59 mm minute 0 - 59

SS second 00 - 60 (sic, due to leap seconds) ss second 0 - 60 (sic, due to leap seconds) .s tenths .0 - .9 .ss hundredths .00 - .99 .sss thousandths .000 - .999

am show am or pm am or pm a.m. show a.m. or p.m. a.m. or p.m. AM show AM or PM AM or PM A.M. show A.M. or P.M. A.M. or P.M.

. display period . , display comma , : display colon : - display hyphen - _ display space / display slash / \ display backslash \ !c display character c

+ separator (see note) ----------------------------------------------------------------- Note: + displays nothing; it may be used to separate one code from the next to make the format more readable. + is never necessary. For instance, %tchh:MM+am and %tchh:MMam have the same meaning, as does %tc+hh+:+MM+am.

Thus, if you had a %td variable and wanted to display the dates as, for example, January 9, 2002, you could specify the format %tdMonth_dd,_CCYY.

If you had a %tc variable and wanted to display the time as

Fri Aug 18 12:01:35 CDT 2006

you could specify %tcDay_Mon_DD_HH:MM:SS_!C!D!T_CCYY.

The maximum length of a format specifier is 48 characters; the example shown above is 34 characters.

Description

Complete documentation of Stata's treatment of date and time values is provided. Every feature and function is documented here, either in Syntax above or in Remarks below.

Remarks

Remarks are presented under the following headings:

Experimenting with the date and time functions String-to-numeric translation functions The clock() function How clock() interprets the mask Working with two-digit years Working with incomplete dates and times The Clock() function The date() function Translating run-together dates, such as 20060125 The other translation functions Valid times When leap seconds occurred Truncated times Advice on using %tc and %tC Summary Explanation

Experimenting with the date and time functions

The best way to become familiar with Stata's date and time functions is to experiment with the display command.

. display date("5-12-1998", "MDY") 14011

. display %td date("5-12-1998", "MDY") 12may1998

. display clock("5-12-1998 11:15", "MDY hm") 1.211e+12

. display %20.0gc clock("5-12-1998 11:15", "MDY hm") 1,210,590,900,000

. display %tc clock("5-12-1998 11:15", "MDY hm") 12may1998 11:15:00

Remember, when you work with display, you can specify a format in front of the expression to specify how the result is to be formatted.

String-to-numeric translation functions

The string-to-numeric date and time translation functions are

Format | String-to-numeric conversion function -------+----------------------------------------- %tc | clock(string, mask [, topyear]) %tC | Clock(string, mask [, topyear]) | %td | date(string, mask [, topyear]) | %tw | weekly(string, mask [, topyear]) %tm | monthly(string, mask [, topyear]) %tq | quarterly(string, mask [, topyear]) %th | halfyearly(string, mask [, topyear]) %ty | yearly(string, mask [, topyear]) ------------------------------------------------- string is the value to be translated. mask specifies the order of the components. topyear is described in Working with two-digit years below.

These functions are typically used after reading date, time, or date and time data. The data contain values such as "08/12/06", "12-8-2006", "12 Aug 06", "12aug2006 14:23", and "12 aug06 2:23 pm". You read the data into a string variable and then use one of the translation functions to translate the string into a %t variable.

The translation functions are used in expressions, such as

. generate double timeadmitted = clock(timeadmitstr, "DMYhms") . format timeadmitted %tc

. generate datehired = date(datehiredstr, "MDY") . format datehired %td

All functions require two arguments, the string to be translated and a second string specifying the order in which the date and time components occur.

The most useful of these functions are clock(), Clock(), and date(). The other functions are rarely used.

The clock() function

clock() returns a %tc value. The syntax of clock() is

clock(string, mask [, topyear])

Ignore optional argument topyear; we will discuss that below. Second argument mask is a string specifying the order of the components in string and consists of the following codes:

Code | Meaning ------+--------------------------------------- M | month D | day within month Y | 4-digit year 19Y | 2-digit year to be interpreted as 19xx 20Y | 2-digit year to be interpreted as 20xx | h | hour of day m | minutes within hour s | seconds within minute | # | ignore one element ----------------------------------------------

Examples of date strings and the mask required to translate them include

String to translate Corresponding mask ---------------------------------------------------- 01dec2006 14:22 "DMYhm" 01-12-2006 14.22 "DMYhm"

1dec2006 14:22 "DMYhm" 1-12-2006 14:22 "DMYhm"

01dec06 14:22 "DM20Yhm" 01-12-06 14.22 "DM20Yhm"

December 1, 2006 14:22 "MDYhm"

2006 Dec 01 14:22 "YMDhm" 2006-01-12 14:22 "YMDhm"

2006-01-12 14:22:43 "YMDhms" 2006-01-12 14:22:43.2 "YMDhms" 2006-01-12 14:22:43.21 "YMDhms" 2006-01-12 14:22:43.213 "YMDhms"

2006-01-12 2:22:43.213 pm "YMDhms" 2006-01-12 2:22:43.213 pm. "YMDhms" 2006-01-12 2:22:43.213 p.m. "YMDhms" 2006-01-12 2:22:43.213 P.M. "YMDhms"

20060112 1422 "YMDhm"

14:22 "hm" (see note) 2006-12-01 "YMD" (see note)

Wed Dec 01 14:22:43 CST 2006 "#MDhms#Y" ---------------------------------------------------- Note: A subset of components may be specified. clock("14:22", "hm") produces 01jan1960 14:22:00. clock("2006-12-01", "YMD") produces 01dec2006 00:00:00.

Also there is nothing special included in mask to process a.m. and p.m. markers; when you include code h, clock() automatically watches for the meridian markers.

mask may include spaces so that it is more readable; they have no meaning. Thus we can code

. generate double admit = clock(admitstr, "#MDhms#Y")

or code

. generate double admit = clock(admitstr, "# MD hms # Y")

and which we code makes no difference.

How clock() interprets the mask

To specify the appropriate mask, it helps to understand the rules that clock() applies. They are

1. For each string to be translated, remove all punctuation except for the period separating seconds from tenths, hundredths, and thousandths of seconds. Replace the punctuation with a space.

2. Insert a space in the string everywhere that a letter is next to a number or vice versa.

3. Interpret the resulting elements according to mask.

For instance, consider the string

01dec2006 14:22

Under rule 1, the string becomes

01dec2006 14 22

Under rule 2, the string becomes

01 dec 2006 14 22

Now clock() applies rule 3. If the mask is "DMYhm", then clock() interprets "01" as the day, "dec" as the month, and so on.

Or consider the string

Wed Dec 01 14:22:43 CST 2006

Under rule 1, the string becomes

Wed Dec 01 14 22 43 CST 2006

Applying rule 2 does not change the string. Now clock() applies rule 3. If the mask is "#MDhms#Y", clock() skips "Wed", interprets "Dec" as the month, and so on.

The # code serves a second purpose. If it appears at the end of the mask, it specifies that the rest of string is to be ignored. Consider translating

Wed Dec 01 14 22 43 CST 2006 patient 42

The mask code that previously worked when "patient 42" was not part of the string, "#MDhms#Y", will result in a missing value. clock() is careful in the translation and, if the whole string is not used, returns missing. If you end the mask in #, however, clock() ignores the rest of the string. Changing the mask from "#MDhms#Y" to "#MDhms#Y#" will produce the desired result.

Working with two-digit years

Consider translating the string 01-12-06 14:22, which is to be interpreted as 01dec2006 14:22:00. clock() provides two ways of doing this.

The first is to specify the assumed prefix in the mask. 01-12-06 14:22 can be read by specifying mask "DM20Yhm". If we instead wanted to interpret the year as 1906, we would specify mask "DM19Yhm". We could even interpret the year as 1806 by specifying "DM18Yhm".

But what if our data include 01-12-06 14:22 and include 06-15-98 11:01? We want to interpret the first as being in 2006 and the second as being in 1998. That is the purpose of optional argument topyear:

clock(string, mask [, topyear])

When you specify topyear, you are stating that when years in string are two digits, the full year is to be obtained by finding the largest year not exceeding topyear. Thus you could code,

. generate double timestamp = clock(timestr, "DMYhm", 2020)

Two-digit year 06 would be interpreted as 2006 because 2006 does not exceed 2020. Two-digit 98 would be interpreted as 1998 because 2098 does exceed 2020; 1998 does not.

Working with incomplete dates and times

The clock() function does not require that every component of the date and time be specified.

Translating 2006-12-01 with mask "YMD" results in 01dec2006 00:00:00.

Translating 14:22 with mask "hm" results in 01jan1960 14:22:00.

Translating 11-2006 with mask "MY" results in 01nov2006 00:00:00.

The default for a component, if not specified in the mask, is

Code | Default if not specified ------+------------------------- M | 01 D | 01 Y | 1960 | h | 00 m | 00 s | 00 --------------------------------

This feature is useful. You may have data recording "14:22", meaning a duration of 14 hours and 22 minutes, or the time 14:22 each day. See Obtaining and working with durations under Syntax above.

The Clock() function

The syntax of the Clock() function is

Clock(string, mask [, topyear])

The Clock() function is identical to clock() except that, rather than returning a %tc value, it returns %tC.

Note: Clock() is almost identical to Cofc(clock()). The difference is that Clock() understands leap seconds, such as 30jun1997 23:59:60.

The date() function

The syntax of the date() function is

date(string, mask [, topyear])

The date() function is identical to clock() except that it returns a %td value rather than a %tc value. The date() function is the same as dofc(clock()).

Historical note: Stata 10's date() function is much improved over that of previous versions, and the mask is specified a little differently. In previous versions, the codes for year, month, and date were y, m, and d rather than Y, M, and D. Under version control, the old codes are allowed and, in fact, the original date() function is used.

The big advantage of Stata 10's date() is that it will translate run-together dates such as 20061201 (no special action by you required) and translate more complicated date strings such as Wed Dec 01 14:22:43 CST 2006 (special action required in how mask is specified, something that the old date() would not have understood).

Translating run-together dates, such as 20060125

The clock(), Clock(), and date() functions will translate dates and times that are run together, such as 20060125, 060125, and 20060125110215 (which is 25jan2006 11:02:15). There is nothing special that you have to do:

. display %d date("20060125", "YMD") 25jan2006

. display %td date("060125", "20YMD") 25jan2006

. display %tc clock("20060125110215", "YMDhms") 25jan2006 11:02:15

In a data context, you could type

. gen startdate = date(startdatestr, "YMD")

. gen double starttime = clock(starttimestr, "YMDhms")

Remember to read the original data into a string. If you read the data as numeric, the best advice is to read the data again. Numbers such as 20060125 and 20060125110215 will be rounded unless they are stored as doubles.

If you did read them into a double, or you have verified that rounding did not occur, you can convert the variable from numeric to string. The numeric-to-string conversion function is string(), which comes in one- and two-argument forms. You will need the two-argument form:

. gen str startdatestr = string(startdatedouble, "%10.0g")

. gen str starttimestr = string(starttimedouble, "%16.0g")

If you omitted the format, string() would produce 2.01e+07 for 20060125 and 2.01e+13 for 20060125110215. The format we used had a width 2 larger than the length of the integer number, although using a too-wide format would not hurt.

The other translation functions

The other translation functions are

Format | String-to-numeric conversion function -------+----------------------------------------- %tw | weekly(string, mask [, topyear]) %tm | monthly(string, mask [, topyear]) %tq | quarterly(string, mask [, topyear]) %th | halfyearly(string, mask [, topyear]) ------------------------------------------------- string is the value to be translated. mask specifies the order of the components. topyear is described in Working with two-digit years above.

These functions are rarely used because data seldom arrive in these formats.

All the functions translate a pair of numbers: weekly() translates a year and a week number (1-52), monthly() translates a year and a month number (1-12), quarterly() translates a year and a quarter number (1-4), and halfyearly() translates a year and a half number (1-2).

The masks allowed are far more limited than for clock(), Clock(), and date():

Code | Meaning ------+--------------------------------------- Y | 4-digit year 19Y | 2-digit year to be interpreted as 19xx 20Y | 2-digit year to be interpreted as 20xx | W | week number (weekly() only) M | month number (monthly() only) Q | quarter number (quarterly() only) H | half number (halfyearly() only) ---------------------------------------------- The pair of numbers to be translated must be separated by space or punctuation. No extra characters are allowed.

Historical note: Before Stata 10, the mask codes were lowercase letters. Under version control, lowercase letters are still allowed.

Valid times

27:62:90 is an invalid time. If you try to convert 27:62:90 to a %tc or %tC value, you will get a missing value or an error message.

24:00:00 is also invalid. Correct is 00:00:00 of the next day.

In hh:mm:ss, the requirements are 0 <= hh < 24, 0 <= mm < 60, and 0 <= ss < 60, although sometimes 60 is allowed.

31dec2005 23:59:60 is an invalid %tc time but a valid %tC one. 31dec2005 23:59:60 was an inserted leap second.

30dec2005 23:59:60 is an invalid time in both %tc and %tC formats. 30dec2005 23:59:60 was not an inserted leap second. Correct is 31dec2005 00:00:00.

When leap seconds occurred

Stata system file leapseconds.maint lists the dates on which leap seconds occurred. The file is updated periodically (see [R] update; the file is updated when you update ado-files) and Stata's %tC functions access the file to know when leap seconds occurred.

You can access it, too. To view the file, type

. viewsource leapseconds.maint

Truncated times

Consider the time 11:32:59.999. Other, less precise, ways of writing that time are

11:32:59.99 11:32:59.9 11:32:59 11:32

That is, when you suppress the display of more detailed components of the time, the parts that are displayed are not rounded. Stata displays time like a digital watch; the time is 11:32 right up until the instant that it is 11:33.

Advice on using %tc and %tC

Summary Stata provides two time formats:

1. %tC, also known as UTC, which accounts for leap seconds, and

2. %tc, which ignores them (it assumes 86,400 seconds/day).

Systems vary in how they treat time variables. SAS ignores leap seconds. Oracle includes them. Stata handles either. Our advice:

o If you obtain data from a system that accounts for leap seconds, import using Stata's %tC.

a. If you later need to export data to a system that does not account for leap seconds, use Stata's cofC() function to translate time values before exporting.

b. If you intend to tsset the time variable and the analysis will be at the second level or finer, just tsset the %tC variable, specifying the appropriate delta() if necessary, e.g., delta(1000) for seconds.

c. If you intend to tsset the time variable and the analysis will be at coarser than the second level (minute, hour, etc.), create a %tc variable from the %tC variable (generate double tctime = cofC(tCtime)) and tsset that, specifying the appropriate delta() if necessary. You must do that because, in a %tC variable, there are not necessarily 60 seconds in a minute; some minutes have 61 seconds.

o If you obtain data from a system that ignores leap seconds, use Stata's %tc.

a. If you later need to export data to a system that does account for leap seconds, use Stata's Cofc() function to translate time values.

b. If you intend to tsset the time variable, just tsset it, specifying the appropriate delta().

Some users prefer to always use Stata's %tc because %tc values are a little easier to work with. You can do that if

o you do not mind having up to 1 second of error and

o you do not import or export numerical values (clock ticks) from other systems that are using leap seconds, because then there could be nearly 30 seconds of error.

There are two things to remember if you use %tC variables:

1. The number of seconds between two dates is a function of when the dates occurred. Five days from one date is not simply a matter of adding 5*24*60*60*1,000 ms. You might need to add another 1,000 ms. Three hundred and sixty-five days from now might require adding 1,000 or 2,000 ms. The longer the span, the more you might have to add. (The best way to add durations to %tC variables is to extract the components, add to them, and then reconstruct from the numerical components.)

2. You cannot accurately predict date/times into the future. We do not know what the %tC value will be of 25dec2026 00:00:00 because, along the way, the authorities may (and probably will) announce leap seconds.

Explanation

Stata's %tc encoding assumes that there are 24*60*60*1,000 ms per day, just as an atomic clock, counting oscillations between the nucleus of an atom and its electrons, would define it.

Since 1972, leap seconds have been added once or twice a year to keep time measured in synchronization with the earth's rotation. Unlike leap years, however, there is no formula to predict when leap seconds will occur. The earth is on average slowing down, but there is a relatively large random component, and so leap seconds are determined by fiat and announced 6 months before they are inserted. Leap seconds are added, if necessary, on the end of the day on June 30 and December 31 and are designated as 23:59:60.

You may have heard various terms such as GMT and UTC.

GMT is the old Greenwich Mean Time and is based on astronomical observation.

UTC stands for coordinated universal time and is measured by atomic clocks, occasionally corrected for leap seconds.

UT1 is the mean solar time, with which UTC is kept in sync by the occasional addition of a leap second.

TAI is the atomic time on which UTC is based. TAI was set to GMT plus 10 seconds in 1958 and has been running since then.

UNK is our term for the time standard most people use. UNK stands for unknown, or unknowing. UNK is based on a recent time observation, probably UTC, and then most people just assume that there are 86,400 seconds per day after that.

The UNK standard is usually adequate, and you will want to use %tc rather than the leap second-adjusted %tC encoding. If you are using computer-timestamped data, however, you may need to find out whether the timestamping system used leap-second adjustment. Problems can arise even if you do not care about losing or gaining a second here and there.

For instance, you may import timestamp values from other systems as integers, recorded in the number of milliseconds, or export them. You may do this, but as of 18aug2006, if you choose the wrong encoding scheme (choose %tc when you should choose %tC, or vice versa), your recent times will be off by 23 seconds.

To avoid such problems, you may decide to import and export data by using printable forms, such as "Fri Aug 18 14:05:36 CDT 2006". This method has advantages, but for %tC encoding, times such as 23:59:60 are possible. Some systems will refuse to decode such times.

Stata refuses to decode 23:59:60 in the %tc encoding (function clock()) and accepts it with %tC (function Clock()). (When the %tC function Clock() sees a time with a 60th second, the function verifies that the time is one of the official leap seconds.) Thus, when translating from printable forms, you can assume %tc and check for missing values. If there are none, then you can use %tc. You will never be off by more than 1 second. If there are leap seconds in your data, use Clock() to translate them and then, if you still want to work in %tc units, use function cofC() to translate %tC values into %tc. Again you will have no more than 1 second of inaccuracy.

If precision matters, the best way to process %tC data is simply to treat them that way. The inconvenience is that you cannot assume that there are 86,400 seconds per day. To obtain the duration between dates, you must subtract the two time values involved. The other difficulty has to do with dealing with dates in the future. Under the %tC encoding, there is no set value for any date more than 6 months in the future.

Also see

Manual: [D] dates and times

Help: [D] format


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index