Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: rnd discussion


From   Jhilbe@aol.com
To   statalist@hsphsun2.harvard.edu
Subject   st: RE: rnd discussion
Date   Sat, 29 Sep 2007 00:05:53 EDT

Nick et al:

You provided a nice  discussion of rndbin.  A few comments might help to 
understand why and how  these random number generators (RNGs) came to be as they 
are. 

First,  going back to 1993, Stata had only a couple of random number 
generators. Larry  Hamilton had written several others (T, F, and the like) as I 
recall in his  "Statistics with Stata" for version 3. Since I had a need to use 
other random  number generators, I decided to write a complement of them for my 
own work - and  
then decided that others might find them useful as well. I asked Walter  
Linde-Zwirble, a physicist turned health outcomes analyst friend of mine to  
participate. He wrote the beta binomial RNG and helped test the others.  

When these random number generators were written in 1993, Stata was in  
version 3 as I recall. The programming language of Stata was quite different  from 
now. The idea was to use the generators when no other data was in memory. I  
believe that version 3 required this. Anyhow, the programs were  re-written in 
1995, but the change involved the manner in which temporary  variables were 
identified. The logic of the programs was retained. It would have  taken lots of 
work to redo them entirely. It is also important to realize that I  fully 
expected that Stata would re-write them and include them in the next  release.  
Stata 
was the only major Stat package without a compliment of  random number 
generators. I was mistaken.Over 10 years later and they still do  not have them as 
part of the package. 

I created two types of RNGs. One,  generators that simply created a single 
variable with the distributional  properties defined by the user on the command 
line. Assuming no data in memory,  the number of observations and the mean, 
and scale if appropriate, were  specified by the user after the command name.  i 
rarely use  these.

The second type have an x attached to the end of the RNG, eg  rndpoix. This 
command allows one to create artificial data sets. I have  continually used 
these. After specifying the number of observations, one creates  one of more 
normal random numbers, assigns parameter values to them, plus a  value for the 
constant, and runs the RNG. A data set emerges with the same  parameters as 
defined. How to do this is detailed 
using -help rnd-.   

I definitely would have paid more attention to enhancing speed, and  perhaps 
re-writing the algorithm (which use the covering method) if I would have  
known that Stata was not going to write ones for the official package. As it  was, 
they served a good purpose. At times a Stata user suggested a change, which  
we made and substituted for the older one on my directory. Most were put on  
the SSC site on 1997. 

If you are interested in creating artificial data  sets for GLM families 
(Gaussian, binomial, Poisson, negative binomial, gamma,  and inverse Gaussian), 
Roberto Gutierrez (Stata Corp) wrote a suite of programs  for this purpose. The 
logic of the commands is somewhat close to my rnd programs  for the same 
purpose, but I actually like them better. For the binomial RNG,  type -net search 
genbinomial-. These were the RNGs 
used for the chapter on  Overdispersion in Hardin & Hilbe, Generalized Linear 
Models and Extensions,  2nd edition (2007, Stata Press), and in my recently 
released book, Negative  Binomial Regression (2007, Cambridge Univ. Press). 
Using my rnd programs or  Roberto's will give the same results. I like Roberto's 
because you can define  the generated variable rather than have it 
predetermined by the program. This  point was mentioned by Nick. However, it was not 
originally a problem since I  assumed no other data was in memory.

Stata should seriously consider  implementing RNGs in the next release. Mine 
work fine given the caveats  mentioned by Nick. Roberto's are fine as well. 
But they are limited to GLM  families for the purpose of constructing artificial 
data sets in the spirit of  my rndx commands. The other RNGs could well be 
written by the very capable Stata  programmers. Constructing them so that users 
can create artificial data sets  would 
seem to me the ideal way to go. Roberto has already done much of the  work. 

Joseph Hilbe  




************************************** See what's new at http://www.aol.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index