How do you specify the variance function in nbreg to
coincide with Cameron and Trivedi’s (Regression
Analysis of Count Data, page 62) NB1 and NB2 variance functions?
What is the difference between the models fit using nbreg,
dispersion(mean) and nbreg, dispersion(constant)?
|
Title
|
|
The variance function in nbreg
|
|
Author
|
Roberto Gutierrez, StataCorp
|
|
Date
|
October 2001
|
Stata’s nbreg
model comes in two flavors: the default mean dispersion (or equivalently
nbreg, dispersion(mean) and the constant dispersion nbreg,
dispersion(constant). In short,
. nbreg, dispersion(mean) (or just plain nbreg)
corresponds to Cameron and Trivedi’s NB2 variance function, while
. nbreg, dispersion(constant)
corresponds to NB1.
To see why, let’s do the variance calculations ourselves. The
negative binomial model is the hierarchical model y_i | g_i, where g_i is
gamma distributed and
y_i | g_i ~ Poisson(g_i)
That is, the conditional mean and variance of y_i given g_i is merely g_i.
For nbreg, dispersion(mean), with mu_i = exp(xb_i) (xb is the linear
predictor),
g_i ~ Gamma(1/alpha, alpha*mu_i)
where I define the Gamma(a,b) distribution as that having mean ab and
variance ab^2, and alpha is an ancillary parameter to be estimated from the
data.
Naturally,
E(y_i) = E{E(y_i | g_i)}
= E(g_i)
= (1/alpha)*alpha*mu_i
= mu_i
The variance of y_i is
Var(y_i) = E{Var(y_i | g_i)} + Var{E(y_i | g_i)}
= E(g_i) + Var(g_i)
= 1/alpha*(alpha*mu_i) + (1/alpha)*(alpha*mu_i)^2
= mu_i + alpha*mu_i^2
= mu_i * (1 + alpha*mu_i)
This corresponds to Cameron and Trivedi’s equation (3.13) and thus
corresponds to the NB2 model in their terminology. The dispersion for this
model is (1 + alpha*mu_i), which depends on mu_i, hence the moniker
“mean dispersion”.
By comparison, nbreg, dispersion(constant) has the distribution of
g_i as
g_i ~ Gamma(mu_i/delta, delta)
where delta is the ancillary parameter. I could have easily called this
alpha and not delta, but nbreg uses delta to make the distinction
between both models clearer.
Here E(y_i) = mu_i as well, and the variance of y_i is
Var(y_i) = E{Var(y_i | g_i)} + Var{E(y_i | g_i)}
= E(g_i) + Var(g_i)
= (mu_i/delta)*delta + (mu_i/delta)*delta^2
= mu_i + mu_i*delta
= mu_i * (1 + delta)
which (except for calling it delta instead of alpha) corresponds to Cameron
and Trivedi’s equation (3.11), and hence the NB1 model. For this
model, the dispersion is (1 + delta) and thus is constant over all
observations.
For both models, the dispersion is greater than one. This is why
nbreg serves its purpose of modeling data that exhibit dispersion
beyond that which can be handled using Poisson regression, which has
dispersion set to 1.
|