[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
nicola.baldini2@unibo.it |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models |

Date |
Sun, 07 Jun 2009 11:49:19 +0200 |

Assuming that your initial idea is true (not enough good to comment on it), don't -espoisson- and -ssm- work à la Heckman??? Both available from SSC Nicola P.S. I'll NOT receive/read any email but the Digest. At 02.33 06/06/2009 -0400, John Ataguba wrote: >Hi Austin, > >Specifically, I am not looking at the time dimension of the visits. The data set is such that I have total number of visits to a GP (General Practitioner) in the past one month collected from a national survey of individuals. Given that this is a household survey, there are zero visits for some individuals. > >One of my objective is to determine the factors that predict positive utilization of GPs. This is easily implemented using a standard logit/probit model. The other part is the factors that affect the number of visits to a GP. Given that the dependent variable is a count variable, the likely candidates are count regression models. My fear is with how to deal with unobserved heterogeneity and sample selection issues if I limit my analysis to the non-zero visits. If I use the standard two-part or hurdle model, I do not know if this will account for sample selection in the fashion of Heckman procedure. > >I think the class of mixture models (fmm) will be an anternative that I want to explore. I don't know much about them but will be happy to have some brighter ideas. > >Regards > >Jon > > >- ----- Original Message ---- >From: Austin Nichols <austinnichols@gmail.com> >To: statalist@hsphsun2.harvard.edu >Sent: Friday, 5 June, 2009 14:27:20 >Subject: Re: st: RE: AW: Sample selection models under zero-truncated negative binomial models > >Steven--I like this approach in general, but from the original post, >it's not clear that data on the timing of first visit or even time at >risk is on the data--perhaps the poster can clarify? Also, would you >propose using the predicted hazard in the period of first visit as >some kind of selection correction? The outcome is visits divided by >time at risk for subsequent visits in your setup, so represents a >fractional outcome (constrained to lie between zero and one) in >theory, though only the zero limit is likely to bind, which makes it >tricky to implement, I would guess--if you are worried about the >nonnormal error distribution and the selection b > >Ignoring the possibility of detailed data on times of utilization, why >can't you just run a standard count model on number of visits and use >that to predict probability of at least one visit? One visit in 10 >years is not that different from no visits in 10 years, yeah? It >makes no sense to me to predict utilization only for those who have >positive utilization and worry about selection etc. instead of just >using the whole sample, including the zeros. I.e. run a -poisson- to >start with. If you have a lot of zeros, that can just arise from the >fact that a lot of people have predicted number of visits in the .01 >range and number of visits has to be an integer. Zero inflation or >overdispersion also can arise often from not having the right >specification for the explanatory variables... but you can also move >to another model in the -glm- or -nbreg- family. > >On Tue, Jun 2, 2009 at 1:21 PM, <sjsamuels@gmail.com> wrote: >> A potential problem with Jon's original approach is that the use of >> services is an event with a time dimension--time to first use of >> services. People might not use services until they need them. >> Instead of a logit model (my preference also), a survival model for >> the first part might be appropriate. >> >> With later first-use, the time available for later visits is reduced, >> and number of visits might be associated with the time from first use >> to the end of observation. Moreover, people with later first-visits >> (or none) might differ in their degree of need for subsequent visits. >> >> To account for unequal follow-up times, I suggest a supplementary >> analysis in which the outcome for the second part of the hurdle model >> is not the number of visits, but the rate of visits (per unit time at >> risk). >> >> -Steve. >> >> On Tue, Jun 2, 2009 at 12:22 PM, Lachenbruch, Peter >> <Peter..Lachenbruch@oregonstate.edu> wrote: >>> This could also be handled by a two-part or hurdle model. The 0 vs. non-zero model is given by a probit or logit (my preference) model. The non-zeros are modeled by the count data or OLS or what have you. The results can be combined since the likelihood separates (the zero values are identifiable - no visits vs number of visits). >>> >>> >>> -----Original Message----- >>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Martin Weiss >>> Sent: Tuesday, June 02, 2009 7:02 AM >>> To: statalist@hsphsun2.harvard.edu >>> Subject: st: AW: Sample selection models under zero-truncated negative binomial models >>> >>> ************* >>> ssc d cmp >>> ************* >>> -----Ursprüngliche Nachricht----- >>> Von: owner-statalist@hsphsun2.harvard.edu >>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von John Ataguba >>> Gesendet: Dienstag, 2. Juni 2009 16:00 >>> An: Statalist statalist mailing >>> Betreff: st: Sample selection models under zero-truncated negative binomial >>> models >>> >>> Dear colleagues, >>> >>> I want to enquire if it is possible to perform a ztnb (zero-truncated >>> negative binomial) model on a dataset that has the zeros observed in a >>> fashion similar to the heckman sample selection model. >>> >>> Specifically, I have a binary variable on use/non use of outpatient health >>> services and I fitted a standard probit/logit model to observe the factors >>> that predict the probaility of use.. Subsequently, I want to explain the >>> factors the influence the amount of visits to the health facililities. Since >>> this is a count data, I cannot fit the standard Heckman model using the >>> standard two-part procedure in stata command -heckman-. >>> >>> My fear now is that my sample of users will be biased if I fit a ztnb model >>> on only the users given that i have information on the non-users which I >>> used to run the initial probit/logit estimation. >>> >>> Is it possible to generate the inverse of mills' ratio from the probit model >>> and include this in the ztnb model? will this be consistent? etc... >>> >>> Are there any smarter suggestions? Any reference that has used the similar >>> sample selection form will be appreciated. >>> >>> Regards >>> >>> Jon * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: question for listserve** - Next by Date:
**st: Clustering with missing values** - Previous by thread:
**st: question for listserve** - Next by thread:
**st: Clustering with missing values** - Index(es):

© Copyright 1996–2022 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |