# st: negative binomial: model selection and ML issue

 From Natasha Sarkisian To statalist@hsphsun2.harvard.edu Subject st: negative binomial: model selection and ML issue Date Mon, 02 Jun 2003 11:03:52 -0400

Hello,

I need help deciding on the best model for my count dependent variable.
My dependent variable has clear overdispersion and excess (42%) zeros; and
I think some of the zeros might be generated by a separate process, so
this led me to believe that zero-inflated negative binomial model (ZINB)
would be most appropriate. Indeed, Vuong test favors ZINB over regular
NB (but it doesn't allow to use weights so I did the test without them
even though I use them throughout the analyses).

The things are complicated by the fact that my data are complex survey
data with stratification and clustering; since there is no svy routine for
ZINB, I have to run regular zinb with weights (I use population weights)
and cluster option.

When estimating ZINB models this way, I have run into a convergence
problem -- it occurs depending on which independent variables I include
and which subsample of the data I look at. I use the same set of variables
(app. 15-20) both in my main model and in the logit-based "inflate" model;
the variables are selected largely on theoretical grounds. I also look
only at certain subsets of respondents, not the whole sample.

For a number of models I estimate, I encountered the same problem. The
algorithm gets stuck doing "not concave" iterations; if I estimate using
"difficult" option of ML estimation, it always converges, but the last
step is "backed up" (which could be a sign of trouble) and the
coefficients in my inflate model are clearly wrong (most of them are
huge). This does not happen if I do not use the weights -- then the same
models converge just fine, though they have relatively large standard
errors. Cluster option doesn't make a difference. Omitting a few variables
makes it converge better, but there are still a few large coefficients
there... An example of these models is below.

I also was looking into generalized negative binomial -- it has a svy
command (svygnbreg) and looks at predictors of overdispersion (which may
be creating zeros). I found no guidelines in the literature in terms of
when it is a good idea to use gnbreg, so I can't decide if that's a good
substitute for using ZINB.

Is there some way around my ZINB convergence problem? I encounter
convergence problems in many models I estimated (the problem is not
introduced by just one specific variable -- looks more like an issue with
variable combinations), and I am not sure if it would be appopriate to
guide my variable selection by what's converging and what does not. I also
don't feel like it would be appropriate to not use the weights... Would
GNBREG be a good alternative? Please advice me on the best course of
action in this situation -- I would appreciate any help you can offer.

Sincerely,
Natasha Sarkisian
_______________
. *model I would like to estimate

. zinb g2hrsp d2female d2agen d2black d2other2 d2pari h2imcctt d2educi
hlrw2b px2parid p2female p2physneed p2ninci p2mar sibrsp11 wagelg e2hrsi
e2rotati e2irhrsi e2wkendi e2selfi e2satisi if subparem==1 [pw=pweight2],
cluster(psu) inflate(d2female d2agen d2black d2other2 d2pari h2imcctt
d2educi hlrw2b px2parid p2female p2physneed p2ninci p2mar sibrsp11 wagelg
e2hrsi e2rotati e2irhrsi e2wkendi e2selfi e2satisi) difficult

...
Iteration 60: log pseudo-likelihood = -2.618e+08 (backed up)

Zero-inflated negative binomial regression Number of obs = 5470
Nonzero obs = 3169
Zero obs = 2301

Inflation model = logit Wald chi2(20) = 490.81
Log pseudo-likelihood = -2.62e+08 Prob > chi2 = 0.0000

(standard errors adjusted for clustering on psu)
------------------------------------------------------------------------------
| Robust
g2hrsp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2hrsp |
d2female | .1406557 .0700564 2.01 0.045 .0033477 .2779637
d2agen | -.0117666 .0040569 -2.90 0.004 -.0197178 -.0038153
d2nonwhite | .3013627 .0887974 3.39 0.001 .127323 .4754024
d2pari | -.539824 .0930102 -5.80 0.000 -.7221207 -.3575274
h2imcctt | -.1039613 .0318792 -3.26 0.001 -.1664433 -.0414792
d2educi | -.0317899 .0176836 -1.80 0.072 -.0664492 .0028693
hlrw2b | -.1707023 .0872492 -1.96 0.050 -.3417077 .0003031
px2parid | -.5262278 .0818795 -6.43 0.000 -.6867086 -.365747
p2female | -.3299022 .1812394 -1.82 0.069 -.685125 .0253205
p2physneed | .4111594 .0723525 5.68 0.000 .2693511 .5529677
p2ninci | .3075308 .0905585 3.40 0.001 .1300394 .4850222
p2mar | .4085085 .0696108 5.87 0.000 .2720738 .5449431
sibrsp11 | -.0301964 .0131613 -2.29 0.022 -.0559921 -.0044007
wagelg | -.1649294 .0458744 -3.60 0.000 -.2548415 -.0750172
e2hrsi | .0033314 .0025242 1.32 0.187 -.0016158 .0082787
e2rotati | .1603383 .114466 1.40 0.161 -.064011 .3846875
e2irhrsi | .1578396 .1117567 1.41 0.158 -.0611995 .3768786
e2wkendi | -.0327867 .0714544 -0.46 0.646 -.1728347 .1072613
e2selfi | -.191938 .1026659 -1.87 0.062 -.3931595 .0092834
e2satisi | -.0037153 .0234557 -0.16 0.874 -.0496876 .0422571
_cons | 2.727581 .3881979 7.03 0.000 1.966727 3.488435
-------------+----------------------------------------------------------------
inflate |
d2female | -28.81793 3.150122 -9.15 0.000 -34.99206 -22.6438
d2agen | 2.928775 .3040263 9.63 0.000 2.332895 3.524656
d2nonwhite | -87.96681 9.585596 -9.18 0.000 -106.7542 -69.17939
d2pari | 90.93917 9.388431 9.69 0.000 72.53819 109.3402
h2imcctt | -50.60235 5.816926 -8.70 0.000 -62.00331 -39.20138
d2educi | -7.579687 .8116225 -9.34 0.000 -9.170438 -5.988936
hlrw2b | -55.86979 6.024495 -9.27 0.000 -67.67759 -44.062
px2parid | 1.086996 1.010348 1.08 0.282 -.893249 3.067241
p2female | -86.46336 9.174333 -9.42 0.000 -104.4447 -68.482
p2physneed | 6.469905 1.524146 4.24 0.000 3.482634 9.457177
p2ninci | -118.8005 13.24296 -8.97 0.000 -144.7563 -92.84481
p2mar | -66.72213 6.681292 -9.99 0.000 -79.81722 -53.62704
sibrsp11 | 12.73949 1.392078 9.15 0.000 10.01107 15.46791
wagelg | -7.31957 .9356806 -7.82 0.000 -9.15347 -5.48567
e2hrsi | 2.523035 .2748633 9.18 0.000 1.984313 3.061758
e2rotati | 8.648898 2.339595 3.70 0.000 4.063377 13.23442
e2irhrsi | -103.7084 11.32604 -9.16 0.000 -125.9071 -81.5098
e2wkendi | -33.80096 3.447128 -9.81 0.000 -40.5572 -27.04471
e2selfi | -11.07891 1.977784 -5.60 0.000 -14.95529 -7.202523
e2satisi | -35.10592 3.706576 -9.47 0.000 -42.37068 -27.84116
_cons | -61.21454 8.493254 -7.21 0.000 -77.86101 -44.56807
-------------+----------------------------------------------------------------
/lnalpha | .7523365 .0194534 38.67 0.000 .7142085 .7904644
-------------+----------------------------------------------------------------
alpha | 2.121952 .0412792 2.042569 2.20442
------------------------------------------------------------------------------

. *same model, do not use weights -- converges, but st.errors in inflate
model are quite high

...
Iteration 7: log pseudo-likelihood = -11201.427
Zero-inflated negative binomial regression Number of obs = 5470
Nonzero obs = 3169
Zero obs = 2301

Inflation model = logit Wald chi2(20) = 510.37
Log pseudo-likelihood = -11201.43 Prob > chi2 = 0.0000

(standard errors adjusted for clustering on psu)
------------------------------------------------------------------------------
| Robust
g2hrsp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2hrsp |
d2female | .1837291 .0920913 2.00 0.046 .0032334 .3642248
d2agen | -.0086689 .0050976 -1.70 0.089 -.0186599 .0013221
d2nonwhite | .295073 .0786946 3.75 0.000 .1408344 .4493115
d2pari | -.5198972 .0900068 -5.78 0.000 -.6963073 -.343487
h2imcctt | -.1057168 .0500675 -2.11 0.035 -.2038474 -.0075862
d2educi | -.0131438 .0209237 -0.63 0.530 -.0541534 .0278658
hlrw2b | -.1972004 .1524797 -1.29 0.196 -.4960552 .1016544
px2parid | -.5109413 .1024399 -4.99 0.000 -.7117197 -.3101628
p2female | -.3011825 .1946093 -1.55 0.122 -.6826098 .0802447
p2physneed | .3639397 .0776863 4.68 0.000 .2116773 .5162021
p2ninci | .2621527 .0865235 3.03 0.002 .0925697 .4317357
p2mar | .3762602 .0798553 4.71 0.000 .2197467 .5327736
sibrsp11 | -.0295206 .0166526 -1.77 0.076 -.062159 .0031179
wagelg | -.1800483 .0621216 -2.90 0.004 -.3018044 -.0582923
e2hrsi | .0021786 .0023238 0.94 0.348 -.002376 .0067332
e2rotati | .173522 .1035014 1.68 0.094 -.029337 .3763809
e2irhrsi | .1084089 .1060774 1.02 0.307 -.099499 .3163168
e2wkendi | .0036214 .0755695 0.05 0.962 -.1444922 .151735
e2selfi | -.2287223 .1054145 -2.17 0.030 -.4353309 -.0221137
e2satisi | .0015919 .0236225 0.07 0.946 -.0447074 .0478911
_cons | 2.398283 .4202168 5.71 0.000 1.574673 3.221893
-------------+----------------------------------------------------------------
inflate |
d2female | -.3931383 1.681306 -0.23 0.815 -3.688438 2.902161
d2agen | .0750997 .1044836 0.72 0.472 -.1296844 .2798837
d2nonwhite | .4324033 1.207905 0.36 0.720 -1.935048 2.799855
d2pari | .6332534 1.036551 0.61 0.541 -1.398349 2.664856
h2imcctt | -.052675 1.068784 -0.05 0.961 -2.147454 2.042104
d2educi | -.0686759 .2721136 -0.25 0.801 -.6020088 .4646571
hlrw2b | .5954707 2.132381 0.28 0.780 -3.583919 4.77486
px2parid | .2601993 .1652275 1.57 0.115 -.0636407 .5840392
p2female | -2.616072 1.50648 -1.74 0.082 -5.568719 .3365751
p2physneed | -1.163383 .8575723 -1.36 0.175 -2.844193 .5174281
p2ninci | -1.812958 2.432564 -0.75 0.456 -6.580696 2.954779
p2mar | -1.345326 1.612477 -0.83 0.404 -4.505724 1.815071
sibrsp11 | .1152167 .1836884 0.63 0.531 -.2448058 .4752393
wagelg | -.4764495 1.141107 -0.42 0.676 -2.712978 1.760078
e2hrsi | .0446437 .0442841 1.01 0.313 -.0421515 .131439
e2rotati | -.6939602 1.010482 -0.69 0.492 -2.674469 1.286548
e2irhrsi | -.4755367 1.191764 -0.40 0.690 -2.811352 1.860279
e2wkendi | -.6671307 1.019377 -0.65 0.513 -2.665074 1.330812
e2selfi | -3.757805 3.297836 -1.14 0.255 -10.22144 2.705834
e2satisi | -.2447212 .2197594 -1.11 0.265 -.6754417 .1859994
_cons | -2.211992 5.602639 -0.39 0.693 -13.19296 8.768978
-------------+----------------------------------------------------------------
/lnalpha | .7760897 .043908 17.68 0.000 .6900315 .8621478
-------------+----------------------------------------------------------------
alpha | 2.172959 .0954103 1.993778 2.368242
------------------------------------------------------------------------------

*omitting a few variables to have better convergence (with weights).

zinb g2hrsp d2female d2agen d2nonwhite d2pari h2imcctt d2educi hlrw2b
px2parid p2female p2physneed p2ninci p2mar sibrsp11 wagelg if subparem==1
[pw=pweight2], cluster(psu) inflate(d2female d2agen d2nonwhite d2pari
h2imcctt d2educi hlrw2b px2parid p2female p2physneed p2ninci p2mar
sibrsp11 wagelg)

...
Iteration 24: log pseudo-likelihood = -2.626e+08

Zero-inflated negative binomial regression Number of obs = 5470
Nonzero obs = 3169
Zero obs = 2301

Inflation model = logit Wald chi2(14) = 405.95
Log pseudo-likelihood = -2.63e+08 Prob > chi2 = 0.0000

(standard errors adjusted for clustering on psu)
------------------------------------------------------------------------------
| Robust
g2hrsp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2hrsp |
d2female | .129644 .0715788 1.81 0.070 -.0106479 .2699359
d2agen | -.0140406 .0043397 -3.24 0.001 -.0225462 -.0055349
d2nonwhite | .3160163 .0898803 3.52 0.000 .1398542 .4921783
d2pari | -.546273 .0929089 -5.88 0.000 -.728371 -.3641749
h2imcctt | -.1032408 .032373 -3.19 0.001 -.1666907 -.0397909
d2educi | -.0309902 .0176043 -1.76 0.078 -.065494 .0035136
hlrw2b | -.1417171 .0894382 -1.58 0.113 -.3170127 .0335785
px2parid | -.5240047 .0829619 -6.32 0.000 -.6866071 -.3614023
p2female | -.4096451 .2120346 -1.93 0.053 -.8252253 .0059351
p2physneed | .406716 .073386 5.54 0.000 .2628821 .55055
p2ninci | .3203423 .0900738 3.56 0.000 .1438009 .4968837
p2mar | .418398 .0719226 5.82 0.000 .2774323 .5593637
sibrsp11 | -.0301778 .0136236 -2.22 0.027 -.0568795 -.003476
wagelg | -.1524723 .0467387 -3.26 0.001 -.2440784 -.0608662
_cons | 2.962023 .3713221 7.98 0.000 2.234245 3.689801
-------------+----------------------------------------------------------------
inflate |
d2female | -.4696629 1.549934 -0.30 0.762 -3.507477 2.568151
d2agen | .2917171 .1990177 1.47 0.143 -.0983505 .6817847
d2nonwhite | .1340828 1.589364 0.08 0.933 -2.981014 3.24918
d2pari | 4.220015 2.658066 1.59 0.112 -.9896997 9.429729
h2imcctt | -.7522923 .7869644 -0.96 0.339 -2.294714 .7901295
d2educi | .3497856 .3383789 1.03 0.301 -.3134248 1.012996
hlrw2b | -.5419731 1.003275 -0.54 0.589 -2.508356 1.42441
px2parid | -.3837323 .3968861 -0.97 0.334 -1.161615 .3941503
p2female | -13.43632 7.283461 -1.84 0.065 -27.71165 .8389961
p2physneed | -3.196758 1.657933 -1.93 0.054 -6.446248 .0527312
p2ninci | -31.5601 4.553981 -6.93 0.000 -40.48573 -22.63446
p2mar | -17.53377 7.566896 -2.32 0.020 -32.36461 -2.702928
sibrsp11 | 2.664428 1.204219 2.21 0.027 .3042024 5.024653
wagelg | 5.478619 3.52169 1.56 0.120 -1.423766 12.381
_cons | -52.03103 25.32102 -2.05 0.040 -101.6593 -2.402754
-------------+----------------------------------------------------------------
/lnalpha | .7631012 .0194094 39.32 0.000 .7250596 .8011429
-------------+----------------------------------------------------------------
alpha | 2.144918 .0416315 2.064854 2.228086
------------------------------------------------------------------------------

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/