[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models

From   [email protected]
To   [email protected]
Subject   Re: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models
Date   Sun, 07 Jun 2009 11:49:19 +0200

Assuming that your initial idea is true (not enough good to comment on it), don't -espoisson- and -ssm- work à la Heckman??? Both available from SSC

P.S. I'll NOT receive/read any email but the Digest.

At 02.33 06/06/2009 -0400, John Ataguba wrote:
>Hi Austin,
>Specifically, I am not looking at the time dimension of the visits.  The data set is such that I have total number of visits to a GP (General Practitioner) in the past one month collected from a national survey of individuals.  Given that this is a household survey, there are zero visits for some individuals.  
>One of my objective is to determine the factors that predict positive utilization of GPs.  This is easily implemented using a standard logit/probit model.  The other part is the factors that affect the number of visits to a GP.  Given that the dependent variable is a count variable, the likely candidates are count regression models.  My fear is with how to deal with unobserved heterogeneity and sample selection issues if I limit my analysis to the non-zero visits.  If I use the standard two-part or hurdle model, I do not know if this will account for sample selection in the fashion of Heckman procedure.
>I think the class of mixture models (fmm) will be an anternative that I want to explore. I don't know much about them but will be happy to have some brighter ideas.
>- ----- Original Message ----
>From: Austin Nichols <[email protected]>
>To: [email protected]
>Sent: Friday, 5 June, 2009 14:27:20
>Subject: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models
>Steven--I like this approach in general, but from the original post,
>it's not clear that data on the timing of first visit or even time at
>risk is on the data--perhaps the poster can clarify?  Also, would you
>propose using the predicted hazard in the period of first visit as
>some kind of selection correction?  The outcome is visits divided by
>time at risk for subsequent visits in your setup, so represents a
>fractional outcome (constrained to lie between zero and one) in
>theory, though only the zero limit is likely to bind, which makes it
>tricky to implement, I would guess--if you are worried about the
>nonnormal error distribution and the selection b
>Ignoring the possibility of detailed data on times of utilization, why
>can't you just run a standard count model on number of visits and use
>that to predict probability of at least one visit?  One visit in 10
>years is not that different from no visits in 10 years, yeah?  It
>makes no sense to me to predict utilization only for those who have
>positive utilization and worry about selection etc. instead of just
>using the whole sample, including the zeros.  I.e. run a -poisson- to
>start with.  If you have a lot of zeros, that can just arise from the
>fact that a lot of people have predicted number of visits in the .01
>range and number of visits has to be an integer.  Zero inflation or
>overdispersion also can arise often from not having the right
>specification for the explanatory variables...  but you can also move
>to another model in the -glm- or -nbreg- family.
>On Tue, Jun 2, 2009 at 1:21 PM, <[email protected]> wrote:
>> A potential problem with Jon's original approach is that the use of
>> services is an event with a time dimension--time to first use of
>> services.  People might not use services until they need them.
>> Instead of a logit model (my preference also),   a survival model for
>> the first part might be appropriate.
>> With later first-use, the time available for later visits is reduced,
>> and  number of visits might be associated with the time from first use
>> to the end of observation.  Moreover, people with later first-visits
>> (or none) might differ in their degree of  need for subsequent visits.
>> To account for unequal follow-up times,  I suggest a supplementary
>> analysis in which the outcome for the second part of the hurdle model
>> is not the number of visits, but the rate of visits (per unit time at
>> risk).
>> -Steve.
>> On Tue, Jun 2, 2009 at 12:22 PM, Lachenbruch, Peter
>> <[email protected]> wrote:
>>> This could also be handled by a two-part or hurdle model.  The 0 vs. non-zero model is given by a probit or logit (my preference) model.  The non-zeros are modeled by the count data or OLS or what have you.  The results can be combined since the likelihood separates (the zero values are identifiable - no visits vs number of visits).
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of Martin Weiss
>>> Sent: Tuesday, June 02, 2009 7:02 AM
>>> To: [email protected]
>>> Subject: st: AW: Sample selection models under zero-truncated negative binomial models
>>> *************
>>> ssc d cmp
>>> *************
>>> -----Ursprüngliche Nachricht-----
>>> Von: [email protected]
>>> [mailto:[email protected]] Im Auftrag von John Ataguba
>>> Gesendet: Dienstag, 2. Juni 2009 16:00
>>> An: Statalist statalist mailing
>>> Betreff: st: Sample selection models under zero-truncated negative binomial
>>> models
>>> Dear colleagues,
>>> I want to enquire if it is possible to perform a ztnb (zero-truncated
>>> negative binomial) model on a dataset that has the zeros observed in a
>>> fashion similar to the heckman sample selection model.
>>> Specifically, I have a binary variable on use/non use of outpatient health
>>> services and I fitted a standard probit/logit model to observe the factors
>>> that predict the probaility of use..  Subsequently, I want to explain the
>>> factors the influence the amount of visits to the health facililities. Since
>>> this is a count data, I cannot fit the standard Heckman model using the
>>> standard two-part procedure in stata command -heckman-.
>>> My fear now is that my sample of users will be biased if I fit a ztnb model
>>> on only the users given that i have information on the non-users which I
>>> used to run the initial probit/logit estimation.
>>> Is it possible to generate the inverse of mills' ratio from the probit model
>>> and include this in the ztnb model? will this be consistent? etc...
>>> Are there any smarter suggestions?  Any reference that has used the similar
>>> sample selection form will be appreciated.
>>> Regards
>>> Jon 

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index