[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models

From   John Ataguba <>
Subject   Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models
Date   Fri, 5 Jun 2009 14:03:06 +0000 (GMT)

Hi Austin,

Specifically, I am not looking at the time dimension of the visits.  The data set is such that I have total number of visits to a GP (General Practitioner) in the past one month collected from a national survey of individuals.  Given that this is a household survey, there are zero visits for some individuals.  

One of my objective is to determine the factors that predict positive utilization of GPs.  This is easily implemented using a standard logit/probit model.  The other part is the factors that affect the number of visits to a GP.  Given that the dependent variable is a count variable, the likely candidates are count regression models.  My fear is with how to deal with unobserved heterogeneity and sample selection issues if I limit my analysis to the non-zero visits.  If I use the standard two-part or hurdle model, I do not know if this will account for sample selection in the fashion of Heckman procedure.

I think the class of mixture models (fmm) will be an anternative that I want to explore. I don't know much about them but will be happy to have some brighter ideas.



----- Original Message ----
From: Austin Nichols <>
Sent: Friday, 5 June, 2009 14:27:20
Subject: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models

Steven--I like this approach in general, but from the original post,
it's not clear that data on the timing of first visit or even time at
risk is on the data--perhaps the poster can clarify?  Also, would you
propose using the predicted hazard in the period of first visit as
some kind of selection correction?  The outcome is visits divided by
time at risk for subsequent visits in your setup, so represents a
fractional outcome (constrained to lie between zero and one) in
theory, though only the zero limit is likely to bind, which makes it
tricky to implement, I would guess--if you are worried about the
nonnormal error distribution and the selection b

Ignoring the possibility of detailed data on times of utilization, why
can't you just run a standard count model on number of visits and use
that to predict probability of at least one visit?  One visit in 10
years is not that different from no visits in 10 years, yeah?  It
makes no sense to me to predict utilization only for those who have
positive utilization and worry about selection etc. instead of just
using the whole sample, including the zeros.  I.e. run a -poisson- to
start with.  If you have a lot of zeros, that can just arise from the
fact that a lot of people have predicted number of visits in the .01
range and number of visits has to be an integer.  Zero inflation or
overdispersion also can arise often from not having the right
specification for the explanatory variables...  but you can also move
to another model in the -glm- or -nbreg- family.

On Tue, Jun 2, 2009 at 1:21 PM, <> wrote:
> A potential problem with Jon's original approach is that the use of
> services is an event with a time dimension--time to first use of
> services.  People might not use services until they need them.
> Instead of a logit model (my preference also),   a survival model for
> the first part might be appropriate.
> With later first-use, the time available for later visits is reduced,
> and  number of visits might be associated with the time from first use
> to the end of observation.  Moreover, people with later first-visits
> (or none) might differ in their degree of  need for subsequent visits.
> To account for unequal follow-up times,  I suggest a supplementary
> analysis in which the outcome for the second part of the hurdle model
> is not the number of visits, but the rate of visits (per unit time at
> risk).
> -Steve.
> On Tue, Jun 2, 2009 at 12:22 PM, Lachenbruch, Peter
> <> wrote:
>> This could also be handled by a two-part or hurdle model.  The 0 vs. non-zero model is given by a probit or logit (my preference) model.  The non-zeros are modeled by the count data or OLS or what have you.  The results can be combined since the likelihood separates (the zero values are identifiable - no visits vs number of visits).
>> -----Original Message-----
>> From: [] On Behalf Of Martin Weiss
>> Sent: Tuesday, June 02, 2009 7:02 AM
>> To:
>> Subject: st: AW: Sample selection models under zero-truncated negative binomial models
>> *************
>> ssc d cmp
>> *************
>> -----Ursprüngliche Nachricht-----
>> Von:
>> [] Im Auftrag von John Ataguba
>> Gesendet: Dienstag, 2. Juni 2009 16:00
>> An: Statalist statalist mailing
>> Betreff: st: Sample selection models under zero-truncated negative binomial
>> models
>> Dear colleagues,
>> I want to enquire if it is possible to perform a ztnb (zero-truncated
>> negative binomial) model on a dataset that has the zeros observed in a
>> fashion similar to the heckman sample selection model.
>> Specifically, I have a binary variable on use/non use of outpatient health
>> services and I fitted a standard probit/logit model to observe the factors
>> that predict the probaility of use..  Subsequently, I want to explain the
>> factors the influence the amount of visits to the health facililities. Since
>> this is a count data, I cannot fit the standard Heckman model using the
>> standard two-part procedure in stata command -heckman-.
>> My fear now is that my sample of users will be biased if I fit a ztnb model
>> on only the users given that i have information on the non-users which I
>> used to run the initial probit/logit estimation.
>> Is it possible to generate the inverse of mills' ratio from the probit model
>> and include this in the ztnb model? will this be consistent? etc...
>> Are there any smarter suggestions?  Any reference that has used the similar
>> sample selection form will be appreciated.
>> Regards
>> Jon

*  For searches and help try:


*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index