StataListers:
Kit has posted posted a program that I submitted called censornb to the SSC
site. The program is a maximum likelihood censored negative binomial
regression procedure, parameterized as a survival model. In this respect it is
similar to the cpoisson program -- censored Poisson -- that is already on the site.
There are two parameterizations of censored count models. The traditonal
econometric parameterization requires the user to specify cut points, beyond
which observations are considered censored. For instance, given a range of
counts from 0-50, one can specify a cut point at 5 to indicate left censoring of
observations less than 5. Those outlying observations are then re-valued to
that of the cut point. The same is the case for upper or right censoring.
Observations within the cut points cannot be censored.
I have called the above an econometric parameterization. It differs from the
traditional survival model parameterization where any observation in the
data may be identified as right or left censored. Moreover, the values of
censored survival observations are not changed. This is the parameterization used
for Cox proportional hazard models, as well as the standard parametric
survival models, e.g. exponential, Weibull, lognormmal, gamma, and so forth.
It took me awhile to figure out the loglikelihood functions for right and
left censored negative binomial observations, which employs an incomplete beta
function. There is no other software with which to directly compare results,
nor is there any literature on parameterization. I used LIMDEP's censored
Poisson and censored NB programs, with a specified cut point, to compare with
the survival censored Poisson and censored NB programs. I defined censored
observations to be the same as those above the cut point I selected in LIMDEP.
So the same block of observations were censored, The results, as would be
expected, were nearly identical. The advantage, however, with the survival
version is that there is no limitation on which observations can be left or right
censored in the same model.
I have also added the AIC and BIC GOF statistics to facilitate model
comparson. The program is written using version 9.1, hence allowing for a variety of
ML and survey options.
I wish to recommend the recently published 2nd edition of Long/Freese's
Stata Press book -
"Regression Models for Categorical Dependent Variables Using Stata". Many of
the discussions are the same as those in their 2003 revised 1st edition. But
there is also a substantial amount of added material, all referencing Stata 9
programs and code. It's a whopping 527 pages in length compared with the
previous 368 pages. I believe the book to be indespensible for any Stata user
who deals with categorical response data, such as logistic regression, Poisson
and negative boinomial regression, and ordinal and multinomial models. There
is a great discussion of ZIP/ZINB and an added section on hurdle models,
which are not in official Stata. I might add here that unbeknownst to the
authors, there are 9 hurdle models posted to the SSC site. These may be used in
conjunction with the book for modeling count data for which the 0 counts are
thought to have come from a separate process than the process generating positive
counts. Anyhow, I believe the book to be one of the most useful books on
discrete response data currently on the market. A worthwhile purchase!
Joe Hilbe
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/