[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: rnd discussion
st: RE: rnd discussion
Sat, 29 Sep 2007 00:05:53 EDT
Nick et al:
You provided a nice discussion of rndbin. A few comments might help to
understand why and how these random number generators (RNGs) came to be as they
First, going back to 1993, Stata had only a couple of random number
generators. Larry Hamilton had written several others (T, F, and the like) as I
recall in his "Statistics with Stata" for version 3. Since I had a need to use
other random number generators, I decided to write a complement of them for my
own work - and
then decided that others might find them useful as well. I asked Walter
Linde-Zwirble, a physicist turned health outcomes analyst friend of mine to
participate. He wrote the beta binomial RNG and helped test the others.
When these random number generators were written in 1993, Stata was in
version 3 as I recall. The programming language of Stata was quite different from
now. The idea was to use the generators when no other data was in memory. I
believe that version 3 required this. Anyhow, the programs were re-written in
1995, but the change involved the manner in which temporary variables were
identified. The logic of the programs was retained. It would have taken lots of
work to redo them entirely. It is also important to realize that I fully
expected that Stata would re-write them and include them in the next release.
was the only major Stat package without a compliment of random number
generators. I was mistaken.Over 10 years later and they still do not have them as
part of the package.
I created two types of RNGs. One, generators that simply created a single
variable with the distributional properties defined by the user on the command
line. Assuming no data in memory, the number of observations and the mean,
and scale if appropriate, were specified by the user after the command name. i
rarely use these.
The second type have an x attached to the end of the RNG, eg rndpoix. This
command allows one to create artificial data sets. I have continually used
these. After specifying the number of observations, one creates one of more
normal random numbers, assigns parameter values to them, plus a value for the
constant, and runs the RNG. A data set emerges with the same parameters as
defined. How to do this is detailed
using -help rnd-.
I definitely would have paid more attention to enhancing speed, and perhaps
re-writing the algorithm (which use the covering method) if I would have
known that Stata was not going to write ones for the official package. As it was,
they served a good purpose. At times a Stata user suggested a change, which
we made and substituted for the older one on my directory. Most were put on
the SSC site on 1997.
If you are interested in creating artificial data sets for GLM families
(Gaussian, binomial, Poisson, negative binomial, gamma, and inverse Gaussian),
Roberto Gutierrez (Stata Corp) wrote a suite of programs for this purpose. The
logic of the commands is somewhat close to my rnd programs for the same
purpose, but I actually like them better. For the binomial RNG, type -net search
genbinomial-. These were the RNGs
used for the chapter on Overdispersion in Hardin & Hilbe, Generalized Linear
Models and Extensions, 2nd edition (2007, Stata Press), and in my recently
released book, Negative Binomial Regression (2007, Cambridge Univ. Press).
Using my rnd programs or Roberto's will give the same results. I like Roberto's
because you can define the generated variable rather than have it
predetermined by the program. This point was mentioned by Nick. However, it was not
originally a problem since I assumed no other data was in memory.
Stata should seriously consider implementing RNGs in the next release. Mine
work fine given the caveats mentioned by Nick. Roberto's are fine as well.
But they are limited to GLM families for the purpose of constructing artificial
data sets in the spirit of my rndx commands. The other RNGs could well be
written by the very capable Stata programmers. Constructing them so that users
can create artificial data sets would
seem to me the ideal way to go. Roberto has already done much of the work.
************************************** See what's new at http://www.aol.com
* For searches and help try: