Graphics for seasonal data
|
Speakers |
Nicholas J. Cox, Durham University, UK
|
Ado-files flagged in green have not been
published in the STB. Some have been previously posted on Statalist.
Seasonality
Seasonality refers to more or less systematic variation with time of year.
It typically accompanies other kinds of variation in time, including trend,
periodic variation on longer or shorter time scales, and irregular or
stochastic variations. It may be of major direct interest, or a nuisance to
be set on one side. The underlying mechanisms may be well understood, or
highly mysterious. On average, seasonal effects may have a relatively simple
structure, easily approximated by a smooth curve, or such effects may be
much more complicated: there may, for example, be spikes associated with
special days or festivals. Examples come from climatology, economics,
medicine, and many other sciences. Many seasonal patterns are driven at
least partly by weather (or climate) (ice cream sales are affected by
temperature), but many are not: consider the effects of Christmas, Easter,
Independence Day(s), Ramadan, Thanksgiving, and other cultural, political
and religious events, some fixed, some variable in timing.
This presentation focuses on graphical methods for showing seasonal data, or
the seasonal components of such data. In several cases, the ideas could be
adopted either directly or with minimal change to variation with (e.g.) time
of day, but most concrete details and examples will refer to time of year.
More generally, some of the ados have wider application.
Circular or linear format
An essential feature of seasonal data is that time of year is a circular
scale. Clearly, January follows December just as February follows January.
Cutting the scale according to any conventional (e.g. Western) calendar may
make it difficult to appreciate the complete seasonal cycle. With other
kinds of circular data, such as geographical features that have a compass
bearing (East, SW), a circular graph format is often used for this reason.
However, this does not seem so useful for seasonal data. This may be because
(a) seasonality is usually just one aspect of behavior over time, and
workers on time series use graphs with time on a linear (horizontal) scale,
and (b) people are not familiar with circular calendars (or 24 hour clocks),
in contrast to (say) map or compass representations, which are highly
familiar to many scientists. In any case, there are advantages in a linear
format, with response on the y axis and time of year on the x
axis, in which response is easily decoded and any horizontal line is easily
interpreted as a reference constant.
Basic tips and tasks
Most basically,
. graph response timeofyr , c(L) sy( )
plots values such that each year is represented by a single connected line.
c(L) specifies that values are to be connected only if
timeofyr is increasing (strictly, not decreasing).
It is often useful to copy the early part of each year after the end and the
later part of each year at the beginning of each year. pextend is a utility that creates the extra
observations, after which graph can be used. In practice,
1.5 cycles seem to work well.
lamonmon and
ladaymons are utilities that
produce conventional month labels (such as J or Jan for
January) for months or days of the year.
January is not always the best starting point for the time axis. For
example, when looking at rainfall in a Northern hemisphere climate with dry
summer and wet winter, starting in July avoids cutting the wet season
awkwardly. grotate is an ado for
egen. Type
. egen newvar = rotate(oldvar), start(#)
to rotate a month variable to start at #.
Moving summaries
For time series generally, movsumm
generates summaries for overlapping windows. These summaries can be
anything produced by generate, such as mean, major quantiles,
variance, skewness or kurtosis. The calculation may be weighted within the
window, and — of special importance for seasonal problems — the
calculation may be wrapped around from end to beginning, treating the series
as circular.
Seasonal subseries plot
Other terminology for this plot: cycle plot, cycle-subseries plot, month
plot, seasonal-by-month plot.
sssplot plots all values for each
`month´ together. `Month´ is here suggestive, not
mandatory: the ado applies whenever certain periods (e.g. years) are
divided into a fixed number of shorter periods (e.g. months, quarters).
Using the start( ) option, the plot can be started at any
`month´: that is, the `year´ can be rotated.
Using the sf( ) option, summaries can be plotted for each
`month´, such as the mean or median.
Folding and looping
The cyclical character of seasonal behavior can be investigated by folding
the time axis around some midpoint and plotting the response as usual. This
is implemented in foldplot.
Alternatively, if there are two variables varying with time of year, then
their joint trajectory can be shown on a ordinary scatter plot.
loopplot adds a stylistic
flourish, namely, each loop is closed; that is, the end and beginning values
are joined.
Touching bars
For variables that are strictly totals for shorter periods, some purists
prefer touching bars, not point symbols. This can be done with
barplot.
|
Meetings
Stata Conference
User Group meetings
Proceedings
|