From   n j cox <>
Date   Thu, 24 May 2007 15:27:47 +0100

Thanks to Kit Baum, a new package -group1d- is now
available from SSC. Stata 9 is required. Use -ssc-
to install.

Please note that that "1" is the numeral "1", not
the letter "l". One earlier version was called -grouponed-
but I got fed up with typing that, so it fell down
the Darwinian drain of unnatural deselection.

An earlier version of this program was posted on
13 May in a thread on -cluster kmeans- started by Herve

The version there is Stata 8. Among other things, I
said "A Mata-based version should follow" -- which wasn't
quite a promise, but since no one else jumped in with
a Mata translation, I ended up doing it myself. Initial experiments
indicated that the Mata version was about 30 times faster,
so I abandoned the Stata 8 version and will not maintain or
even support the version posted on 13 May. I will signal
that the syntax there differs from that now used.

So much for that. -group1d- offers grouping or clustering
in one dimension. Natural examples are the values of a variable sorted from smallest to largest, the values of a time series, or the values of a spatial series along or down a profile, transect, core, borehole, etc. n values are clustered into one or more contiguous groups, the (k - 1)
boundaries between k groups being chosen to minimise the sum of the
within-cluster sums of squared deviations from cluster means over the
comb(n - 1, k - 1) possible clusterings. The clustering produced is
guaranteed optimal, but it may not be unique.

The help file is quite detailed, so I will just add a few broader remarks:

0. It's been said before, but I'll say it again. Mata really
is a nice vehicle for taking someone else's Fortran and then doing
things in your favourite statistical software. (Historically, that's
not quite what happened, as over the years someone else's Fortran became my Fortran became my Basic became my Stata 5 became my Stata 8 became my Stata 9, but I wish it had, and in any case I went back to the original in rewriting this version of the code.)

1. -group1d- has absolutely no connection to official Stata's
cluster of cluster analysis commands, -cluster-, nor am I likely to add stuff that echoes, mimics or otherwise resembles that truly excellent suite.

2. -group1d- grows out of work that was part of my Ph.D. thesis,
which was awarded sometime in the Late Cretaceous. I haven't tried
to keep up with the literature since about 1980 [sic] so would
appreciate pertinent references. A quick internet search indicates
that much of the work I was aware of is being rediscovered or
reinvented, for example by economists interested in ("nonparametric")
ways of identifying structural breaks in time series, especially when
you do not know in advance how many there were.

3. -group1d- could be used for doing some rather weak science, but
that's up to anyone who uses it. In particular, in various literatures
people entertain puncuatated equilibrium views of their systems
(the unsympathetic description "evolution by jerks" springs to mind
from some past polemics in biology) whereby
economies, or whatever, supposedly make sudden jumps from time
to time and it is the job of the technique to work out when they
happened. Well, your mileage may vary, but if they really happened
it should be evident in graphs of the data without any need for elaborate formal analysis, and if conversely you have to strain to find them they can't be that major. Still, apart from being fun to write,
I have found uses for this program, and in one real case study with
a colleague on environmental data it underlined his suspicion of a change in behaviour (in this case, in what laboratories were doing, not nature).


