[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
sjsamuels@gmail.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models |

Date |
Fri, 5 Jun 2009 13:06:27 -0400 |

I agree with Austin. With the retrospective design, there is no natural "first" visit or start time. As such, a single visit isn't a privileged event. I don't see a role for a time-to-event approach. I, like the other responders, assumed a prospective design. Posters: please describe your study designs in detail! You will save all of us a lot of unnecessary time!. -Steve On Fri, Jun 5, 2009 at 12:50 PM, Austin Nichols<austinnichols@gmail.com> wrote: > Tony <Peter.Lachenbruch@oregonstate.edu> : > > I suppose the model required depends on what question the poster > wishes to answer, but there is no clear advantage of a logit or probit > over a poisson in this case unless you have no interest in the > variation in positive outcomes or you suspect overdispersion is a > serious issue even conditional on X which implies you have other count > models to use; note that heterosk. or measurement error in the binary > outcome or individual heterogeneity are all a much bigger deal in the > logit/probit world. > > It may be that having no visits seems different from having one or > more but having one visit also seems different from having two or > more. Where does that reasoning stop? If your expected number of > visits conditional on X is 0.01 then odds are you have no visits this > month; you might have one, but you are very unlikely to have six. If > your expected number of visits conditional on X is 1 then odds are > still good you have no visits this month; you might have one, and you > are not terribly unlikely to have six. The reasoning all gets easier > in a poisson model IMHO. > > A "preponderance of zeros" just means the mean Xb is low, as is to be > expected. All too often, the long right tail is predictable from > various X variables in the data, so conditional on X, the poisson > variance may be closer to correct; if it isn't, you may need a richer > model! Or program up the "Flexible Regression Model for Count Data" > (Kimberly F. Sellers and Galit Shmueli) with under- and > overdispersion. > > Above all, why try to implement some kind of selection correction when > you can just avoid the selection in the first place? > > On Fri, Jun 5, 2009 at 12:28 PM, Lachenbruch, Peter > <Peter.Lachenbruch@oregonstate.edu> wrote: >> I think the situations may be distinct: having no hospital visits seems different from having one or more. If these are not part of a mixture distribution (i.e., 0 visits is identifiable) one can estimate the probability of a person having 0 visits and then the count of number of non-zero visits. If not identifiable, one can use zero-inflated Poisson or zero-inflated negative binomial. >> >> The problem seems to separate naturally into the two parts. If you want a mean number of visits you can get it, but I'm unsure of the interpretation since there's a fraction that don't have any visits that is greater than that expected under the Poisson model. In one dissertation, a student had 95% zeros and the rest were positive. The idea was to predict costs of hospitalization - this had big implications for insurance companies. In this case, the likelihood of finding hospitalization in a household survey may also have a preponderance of zeros. >> >> Tony >> >> Peter A. Lachenbruch >> Department of Public Health >> Oregon State University >> Corvallis, OR 97330 >> Phone: 541-737-3832 >> FAX: 541-737-4001 >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin Nichols >> Sent: Friday, June 05, 2009 9:15 AM >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models >> >> John Ataguba <johnataguba@yahoo.co.uk> : >> >> Again, why split the analysis? If you are interested in the count, >> use a count model, and then talk about what the results from that >> model predict about the probability of a nonzero count when you are >> interested in whether people have any visits. You don't seem to have >> any theory requiring "standard logit/probit model" assumptions. >> -poisson- seems the natural starting point. >> >> Why would you drop the zeros when trying to assess how many GP visits >> a person seems likely to make conditional on X? Zero is one possible >> outcome... >> >> On Fri, Jun 5, 2009 at 10:03 AM, John Ataguba <johnataguba@yahoo.co.uk> wrote: >>> Hi Austin, >>> >>> Specifically, I am not looking at the time dimension of the visits. The data set is such that I have total number of visits to a GP (General Practitioner) in the past one month collected from a national survey of individuals. Given that this is a household survey, there are zero visits for some individuals. >>> >>> One of my objective is to determine the factors that predict positive utilization of GPs. This is easily implemented using a standard logit/probit model. The other part is the factors that affect the number of visits to a GP. Given that the dependent variable is a count variable, the likely candidates are count regression models. My fear is with how to deal with unobserved heterogeneity and sample selection issues if I limit my analysis to the non-zero visits. If I use the standard two-part or hurdle model, I do not know if this will account for sample selection in the fashion of Heckman procedure. >>> >>> I think the class of mixture models (fmm) will be an anternative that I want to explore. I don't know much about them but will be happy to have some brighter ideas. >>> >>> Regards >>> >>> Jon >>> >>> >>> ----- Original Message ---- >>> From: Austin Nichols <austinnichols@gmail.com> >>> To: statalist@hsphsun2.harvard.edu >>> Sent: Friday, 5 June, 2009 14:27:20 >>> Subject: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models >>> >>> Steven--I like this approach in general, but from the original post, >>> it's not clear that data on the timing of first visit or even time at >>> risk is on the data--perhaps the poster can clarify? Also, would you >>> propose using the predicted hazard in the period of first visit as >>> some kind of selection correction? The outcome is visits divided by >>> time at risk for subsequent visits in your setup, so represents a >>> fractional outcome (constrained to lie between zero and one) in >>> theory, though only the zero limit is likely to bind, which makes it >>> tricky to implement, I would guess--if you are worried about the >>> nonnormal error distribution and the selection b >>> >>> Ignoring the possibility of detailed data on times of utilization, why >>> can't you just run a standard count model on number of visits and use >>> that to predict probability of at least one visit? One visit in 10 >>> years is not that different from no visits in 10 years, yeah? It >>> makes no sense to me to predict utilization only for those who have >>> positive utilization and worry about selection etc. instead of just >>> using the whole sample, including the zeros. I.e. run a -poisson- to >>> start with. If you have a lot of zeros, that can just arise from the >>> fact that a lot of people have predicted number of visits in the .01 >>> range and number of visits has to be an integer. Zero inflation or >>> overdispersion also can arise often from not having the right >>> specification for the explanatory variables... but you can also move >>> to another model in the -glm- or -nbreg- family. >>> >>> On Tue, Jun 2, 2009 at 1:21 PM, <sjsamuels@gmail.com> wrote: >>>> A potential problem with Jon's original approach is that the use of >>>> services is an event with a time dimension--time to first use of >>>> services. People might not use services until they need them. >>>> Instead of a logit model (my preference also), a survival model for >>>> the first part might be appropriate. >>>> >>>> With later first-use, the time available for later visits is reduced, >>>> and number of visits might be associated with the time from first use >>>> to the end of observation. Moreover, people with later first-visits >>>> (or none) might differ in their degree of need for subsequent visits. >>>> >>>> To account for unequal follow-up times, I suggest a supplementary >>>> analysis in which the outcome for the second part of the hurdle model >>>> is not the number of visits, but the rate of visits (per unit time at >>>> risk). >>>> >>>> -Steve. >>>> >>>> On Tue, Jun 2, 2009 at 12:22 PM, Lachenbruch, Peter >>>> <Peter..Lachenbruch@oregonstate.edu> wrote: >>>>> This could also be handled by a two-part or hurdle model. The 0 vs. non-zero model is given by a probit or logit (my preference) model. The non-zeros are modeled by the count data or OLS or what have you. The results can be combined since the likelihood separates (the zero values are identifiable - no visits vs number of visits). >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Martin Weiss >>>>> Sent: Tuesday, June 02, 2009 7:02 AM >>>>> To: statalist@hsphsun2.harvard.edu >>>>> Subject: st: AW: Sample selection models under zero-truncated negative binomial models >>>>> >>>>> ************* >>>>> ssc d cmp >>>>> ************* >>>>> -----Ursprüngliche Nachricht----- >>>>> Von: owner-statalist@hsphsun2.harvard.edu >>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von John Ataguba >>>>> Gesendet: Dienstag, 2. Juni 2009 16:00 >>>>> An: Statalist statalist mailing >>>>> Betreff: st: Sample selection models under zero-truncated negative binomial >>>>> models >>>>> >>>>> Dear colleagues, >>>>> >>>>> I want to enquire if it is possible to perform a ztnb (zero-truncated >>>>> negative binomial) model on a dataset that has the zeros observed in a >>>>> fashion similar to the heckman sample selection model. >>>>> >>>>> Specifically, I have a binary variable on use/non use of outpatient health >>>>> services and I fitted a standard probit/logit model to observe the factors >>>>> that predict the probaility of use.. Subsequently, I want to explain the >>>>> factors the influence the amount of visits to the health facililities. Since >>>>> this is a count data, I cannot fit the standard Heckman model using the >>>>> standard two-part procedure in stata command -heckman-. >>>>> >>>>> My fear now is that my sample of users will be biased if I fit a ztnb model >>>>> on only the users given that i have information on the non-users which I >>>>> used to run the initial probit/logit estimation. >>>>> >>>>> Is it possible to generate the inverse of mills' ratio from the probit model >>>>> and include this in the ztnb model? will this be consistent? etc... >>>>> >>>>> Are there any smarter suggestions? Any reference that has used the similar >>>>> sample selection form will be appreciated. >>>>> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Sample selection models under zero-truncated negative binomial models***From:*John Ataguba <johnataguba@yahoo.co.uk>

**st: RE: AW: Sample selection models under zero-truncated negative binomial models***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models***From:*sjsamuels@gmail.com

**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models***From:*John Ataguba <johnataguba@yahoo.co.uk>

**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models***From:*Austin Nichols <austinnichols@gmail.com>

**RE: st: RE: AW: Sample selection models under zero-truncated negative binomial models***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models** - Next by Date:
**Re: st: Reshape like problem** - Previous by thread:
**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models** - Next by thread:
**Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models** - Index(es):

© Copyright 1996–2022 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |