Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: showing "nice dates" on time axis when using date-time variables; -mylabels- updated on SSC

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject st: showing "nice dates" on time axis when using date-time variables; -mylabels- updated on SSC Date Mon, 20 Aug 2012 17:58:33 +0100

A recent question on Gabriel Rossman's blog shows a new variant on a
little old problem.

The little problem leads to a fairly long discussion, so let me give a
bottom line at the start so that you can judge whether you care, or
prefer just to bail out now.

The issue is having date-time data, something like 17 August 2012
15:28:00, but wanting to see on a time axis something more like 1 Aug
2012 or 1 Sep 2012.

An outcome is that -mylabels- (SSC) has been extended so that it can
help with this problem; thanks as always to Kit Baum for putting up
revised files.

Nick

See
<http://codeandculture.wordpress.com/2010/02/08/memetracker-into-stata/>
for a graph which nicely illustrates the problem.

Here the x axis variable is a date-time. The x axis labels shown are
in fact nice numbers in terms of clock time in milliseconds, but by
any other standard they are arbitrary and awkward.

That is, the labels shown make sense if you make a couple of
calculations and realise that they are 5 billion ms apart:

. di %18.0f clock("25jun2008 08:00:00", "DMY hms")
1530000000000

. di %18.0f clock("22aug2008 4:53:20", "DMY hms")
1535000000000

How would you improve on that default?

First off, note that the x axis labels show times of day as well as
dates. I guess times of day are not interesting or important in the
original problem. It would be easy enough to trim the times with a
declared format such as

... xla(, format(%tcD_m_CY))

but the dates of 25jun2008, 22aug2008, 19oct2008, 15dec2008, 11feb2009
are still awkward.

The dates span a period in late 2008 and early 2009, so monthly labels
would seem more natural for human consumption, even though not
regularly spaced in terms of clock time.

Choices are free, but the labels
"1 Aug 2008" "1 Sep 2008" "1 Oct 2008" "1 Nov 2008" "1 Dec 2008" "1
Jan 2009" "1 Feb 2009"
would perhaps strike many readers as more attractive.

How do we get such labels? The key idea is that we can calculate the
clock time in milliseconds from the information in those labels (in
principle we are picking midnight at the start of each day), so there
is
scope to automate that. Conversely, note that something like
"Christmas 2008" would make no sense to Stata; you could get such a
label on a time axis, but you would need to work out otherwise where
it should go.

Let's do that automation from first principles, and then point to a
convenience command that makes it all, well, more convenient.

First put the labels you want in a local macro:

local dates `" "1 Aug 2008" "1 Sep 2008" "1 Oct 2008" "1 Nov 2008" "1
Dec 2008" "1 Jan 2009" "1 Feb 2009" "'

This line of code itself benefits from a digression. I have chosen a
challenging example, partly to show that we can set a reasonably high
standard and achieve it. Stata users as people would have no
difficulty in parsing

1 Aug 2008 1 Sep 2008 1 Nov 2008 1 Dec 2008 1 Jan 2009 1 Feb 2009

into a series of dates, but Stata unaided cannot cope. We must wrap
each date using " " as delimiters. However, suppose you try the
definition

local dates "1 Aug 2008" "1 Sep 2008" "1 Oct 2008" "1 Nov 2008" "1 Dec
2008"  "1 Jan 2009" "1 Feb 2009"

This won't work as intended, as the very first and very last quotation
marks get stripped as delimiters, so we need the extra device of
so-called compound double quotation marks `" "' to bind the entire
macro contents together. End of digression!

Now we can loop over those dates and calculate the clock time for each:

foreach d of local dates {
local xdates `xdates’ `=clock("`d'", "DMY")' `"`d'"'
}

Each time round the loop, Stata takes a date like "1 Aug 2008", feeds
it to the -clock()- function with the extra argument "DMY" (it's a
date-time in the form "day month year"). Time of day 00:00:00 is
assumed by -clock()- as default. So the first calculation is of

clock("1 Aug 2008", "DMY")

If you try this for yourself, you will get

. di %15.0f clock("1 Aug 2008", "DMY")
1533168000000

The larger point here is that the -display- command is very useful to
see the results of small calculations. A smaller point is that you may
need to specify your own format to see exact details for very big (or
indeed very small) numbers. I will add also a note that nothing stops
you using midday or any other time of day if that seems more
appropriate.

. di %15.0f clock("1 Aug 2008 12:00:00", "DMY hms")
1533211200000

Back to the line of code

local xdates `xdates' `=clock("`d'", "DMY")' `"`d'"'

This cumulates the results. The extra syntax `=  ' around the call to
-clock()- says "do this calculation on the fly and put the result
here". In effect we can compress two lines like those below into one:

local thisone = clock("`d'", "DMY")
local xdates `xdates' `thisone' `"`d'"'

What you have at the end of the -foreach- loop is a list of positions
and text labels that you can insert in the -xlabel()- option as

... xlabel(`xdates')

This is what `xdates' looks like inside in this example:

1533168000000 `"1 Aug 2008"' 1535846400000 `"1 Sep 2008"'
1538438400000 `"1 Oct 2008"' 1541116800000 `"1 Nov 2008"'
1543708800000 `"1 Dec 2008"' 1546387200000 `"1 Jan 2009"'
1549065600000 `"1 Feb 2009"'

Congratulations if you are able and willing to do those calculations
in your head and then type out the results. Clearly, automation is
what computers are for.

Although that's all fun in a certain limited sense, it should seem
that in turn it should be automatable in some way. I have taken my
so that it can do this. The syntax would be

. mylabels "1 Aug 2008" "1 Sep 2008" "1 Oct 2008"  "1 Nov 2008" "1 Dec 2008"
"1 Jan 2009" "1 Feb 2009", myscale(clock("@", "DMY")) local(labels)

So, the sequence would be

1. You look at an initial graph, or the data, and decide what axis
labels you want.

2. -mylabels- has one and only one role, to do the fiddly little
calculations of exactly where to put the text that will be the labels
and bundle positions and labels in a local macro. As emphasised, there
has to be a way to calculate the positions from the labels.

3. The local macro is what you then use in the graph call.

I don’t think that there is a way to do this easily with -tlabel()-,
although I would be happy to be corrected. The issue arises with
timestamped data spanning even a few weeks, let alone a few months or
years: the timestamp scale is what the data come in, but is unfriendly
for graphics.

By the way, the principles here are very similar to those in

http://www.stata.com/support/faqs/graphics/date-labels/

Despite what it says, that FAQ is not completely superseded.

For a related discussion, see

Cox, N.J. 2007. Stata tip 55: Better axis labeling for time points and
time intervals. Stata Journal 7(4): 590--592.
<http://www.stata-journal.com/sjpdf.html?articlenum=gr0030>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/