Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: showing "nice dates" on time axis when using date-time variables; -mylabels- updated on SSC |

Date |
Mon, 20 Aug 2012 17:58:33 +0100 |

A recent question on Gabriel Rossman's blog shows a new variant on a little old problem. The little problem leads to a fairly long discussion, so let me give a bottom line at the start so that you can judge whether you care, or prefer just to bail out now. The issue is having date-time data, something like 17 August 2012 15:28:00, but wanting to see on a time axis something more like 1 Aug 2012 or 1 Sep 2012. An outcome is that -mylabels- (SSC) has been extended so that it can help with this problem; thanks as always to Kit Baum for putting up revised files. Details follow beneath my signature. Nick See <http://codeandculture.wordpress.com/2010/02/08/memetracker-into-stata/> for a graph which nicely illustrates the problem. Here the x axis variable is a date-time. The x axis labels shown are in fact nice numbers in terms of clock time in milliseconds, but by any other standard they are arbitrary and awkward. That is, the labels shown make sense if you make a couple of calculations and realise that they are 5 billion ms apart: . di %18.0f clock("25jun2008 08:00:00", "DMY hms") 1530000000000 . di %18.0f clock("22aug2008 4:53:20", "DMY hms") 1535000000000 How would you improve on that default? First off, note that the x axis labels show times of day as well as dates. I guess times of day are not interesting or important in the original problem. It would be easy enough to trim the times with a declared format such as ... xla(, format(%tcD_m_CY)) but the dates of 25jun2008, 22aug2008, 19oct2008, 15dec2008, 11feb2009 are still awkward. The dates span a period in late 2008 and early 2009, so monthly labels would seem more natural for human consumption, even though not regularly spaced in terms of clock time. Choices are free, but the labels "1 Aug 2008" "1 Sep 2008" "1 Oct 2008" "1 Nov 2008" "1 Dec 2008" "1 Jan 2009" "1 Feb 2009" would perhaps strike many readers as more attractive. How do we get such labels? The key idea is that we can calculate the clock time in milliseconds from the information in those labels (in principle we are picking midnight at the start of each day), so there is scope to automate that. Conversely, note that something like "Christmas 2008" would make no sense to Stata; you could get such a label on a time axis, but you would need to work out otherwise where it should go. Let's do that automation from first principles, and then point to a convenience command that makes it all, well, more convenient. First put the labels you want in a local macro: local dates `" "1 Aug 2008" "1 Sep 2008" "1 Oct 2008" "1 Nov 2008" "1 Dec 2008" "1 Jan 2009" "1 Feb 2009" "' This line of code itself benefits from a digression. I have chosen a challenging example, partly to show that we can set a reasonably high standard and achieve it. Stata users as people would have no difficulty in parsing 1 Aug 2008 1 Sep 2008 1 Nov 2008 1 Dec 2008 1 Jan 2009 1 Feb 2009 into a series of dates, but Stata unaided cannot cope. We must wrap each date using " " as delimiters. However, suppose you try the definition local dates "1 Aug 2008" "1 Sep 2008" "1 Oct 2008" "1 Nov 2008" "1 Dec 2008" "1 Jan 2009" "1 Feb 2009" This won't work as intended, as the very first and very last quotation marks get stripped as delimiters, so we need the extra device of so-called compound double quotation marks `" "' to bind the entire macro contents together. End of digression! Now we can loop over those dates and calculate the clock time for each: foreach d of local dates { local xdates `xdates’ `=clock("`d'", "DMY")' `"`d'"' } Each time round the loop, Stata takes a date like "1 Aug 2008", feeds it to the -clock()- function with the extra argument "DMY" (it's a date-time in the form "day month year"). Time of day 00:00:00 is assumed by -clock()- as default. So the first calculation is of clock("1 Aug 2008", "DMY") If you try this for yourself, you will get . di %15.0f clock("1 Aug 2008", "DMY") 1533168000000 The larger point here is that the -display- command is very useful to see the results of small calculations. A smaller point is that you may need to specify your own format to see exact details for very big (or indeed very small) numbers. I will add also a note that nothing stops you using midday or any other time of day if that seems more appropriate. . di %15.0f clock("1 Aug 2008 12:00:00", "DMY hms") 1533211200000 Back to the line of code local xdates `xdates' `=clock("`d'", "DMY")' `"`d'"' This cumulates the results. The extra syntax `= ' around the call to -clock()- says "do this calculation on the fly and put the result here". In effect we can compress two lines like those below into one: local thisone = clock("`d'", "DMY") local xdates `xdates' `thisone' `"`d'"' What you have at the end of the -foreach- loop is a list of positions and text labels that you can insert in the -xlabel()- option as ... xlabel(`xdates') This is what `xdates' looks like inside in this example: 1533168000000 `"1 Aug 2008"' 1535846400000 `"1 Sep 2008"' 1538438400000 `"1 Oct 2008"' 1541116800000 `"1 Nov 2008"' 1543708800000 `"1 Dec 2008"' 1546387200000 `"1 Jan 2009"' 1549065600000 `"1 Feb 2009"' Congratulations if you are able and willing to do those calculations in your head and then type out the results. Clearly, automation is what computers are for. Although that's all fun in a certain limited sense, it should seem that in turn it should be automatable in some way. I have taken my program -mylabels-, which can be downloaded from SSC, and revised it so that it can do this. The syntax would be . mylabels "1 Aug 2008" "1 Sep 2008" "1 Oct 2008" "1 Nov 2008" "1 Dec 2008" "1 Jan 2009" "1 Feb 2009", myscale(clock("@", "DMY")) local(labels) So, the sequence would be 1. You look at an initial graph, or the data, and decide what axis labels you want. 2. -mylabels- has one and only one role, to do the fiddly little calculations of exactly where to put the text that will be the labels and bundle positions and labels in a local macro. As emphasised, there has to be a way to calculate the positions from the labels. 3. The local macro is what you then use in the graph call. I don’t think that there is a way to do this easily with -tlabel()-, although I would be happy to be corrected. The issue arises with timestamped data spanning even a few weeks, let alone a few months or years: the timestamp scale is what the data come in, but is unfriendly for graphics. By the way, the principles here are very similar to those in http://www.stata.com/support/faqs/graphics/date-labels/ Despite what it says, that FAQ is not completely superseded. For a related discussion, see Cox, N.J. 2007. Stata tip 55: Better axis labeling for time points and time intervals. Stata Journal 7(4): 590--592. <http://www.stata-journal.com/sjpdf.html?articlenum=gr0030> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: survival analysis** - Next by Date:
**st: Out-of-sample forecasting using OLS regression** - Previous by thread:
**st: survival analysis** - Next by thread:
**st: Out-of-sample forecasting using OLS regression** - Index(es):