[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Shehzad Ali" <sia500@york.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: stata code for two-part model |

Date |
Fri, 22 Aug 2008 09:43:13 +0100 |

Thank you, Austin. These are very helpful comments. Truly appreciate your help. Shehzad -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin Nichols Sent: 20 August 2008 14:18 To: statalist@hsphsun2.harvard.edu Subject: Re: st: stata code for two-part model Shehzad et al.-- I think a plug for -fmm- (findit fmm) also belongs in this thread, with a mention of the mixtureof(density) option, e.g. mixtureof(gamma). Perhaps the package's author will comment on the preferred mixture model for hospital expenditures, or you can consult one of the refs listed in -help fmm-: Deb, P. and P. K. Trivedi (1997), Demand for Medical Care by the Elderly: A Finite Mixture Approach, Journal of Applied Econometrics, 12, 313-326. -tobit- for expenditures works if willingness to pay for a good is normally distributed and there is an observed market price, so those with WTP<p spend nothing, but this is clearly not the case for hospital expenditures. In fact, it's not clear whether we should consider the patient or the doctor the "consumer" --I suspect this is true even outside the US health care system, unlike much of the concern about the effects of health insurance and incentives, but perhaps to a lesser extent... in the British NHS for example, is it you or your doctor who makes decisions about tests and treatments? I think a simplified resolution of this problem is (a largely unstated) part of the motivation for a two-part model: you make the decision about whether or not to seek care, and conditional on seeking care, your physician makes decisions about tests and treatments, so you no longer control expenditures, to a first approximation. On Tue, Aug 19, 2008 at 4:33 PM, Stas Kolenikov <skolenik@gmail.com> wrote: > -heckman- and -zip- are both trying to deal with too many zeroes (and > so does -tobit-, but it puts just too many assumptions in... although > originally it was developed for the expenditure models). -zip- says > that for some reason, there is a probability of hitting zero before > the rest of Poisson kicks in. -heckman- says that there is selection > and (unobserved) utility functions at work. The selection models are > more of the behavioral flavor, while zip models are more of the > descriptive, if not population-averaging, nature, without trying to > explain why certain people did or did not participate in <whatever>. > > Arguably, you can put a model similar to Heckman's model to hospital > expenditure, too: if a person does not have (good enough) insurance, > they may not be able to afford hospitalization, and choose not to go. > If the (total discounted) budget is less than the predicted hospital > bill, then we observe zero hospitalization costs. So there is a > similar utility / budget interplay, and arguably Mills' ratio does > belong in the linear regression part. > > Alternatively one can say that there are healthy people and sick > people -- the former are spending zero on hospitals, and others spend > some non-zero amounts, with the implicit assumption of perfect markets > and absence of budget constraints. This does not seem quite right to > me, but I can imagine there are occasions where that's how things > might be working. > > In reality, both things should be at play: "too low" expenditure for > the healthy, and "too high" expenditure for the poor. Ideally both > should be modelled (and neither "true" expenditure is observed), but I > am not aware of any models that are aimed specifically at that. > > On Tue, Aug 19, 2008 at 2:55 PM, Austin Nichols <austinnichols@gmail.com> wrote: >> Shehzad Ali <sia500@york.ac.uk>: >> An approach using -heckman- is discussed in the Mullahy ref mentioned >> earlier (http://www.nber.org/papers/t0228), I believe, along with >> -tobit-. >> If the conditional distribution of y seems to fall in two large >> groups, one at zero and one at higher values, with zero density in >> between, there may be more justification for one of the two-part types >> of models where a case is either zero or nonzero, and then the nonzero >> values are determined by a possibly different process. >> If you want to model ln(y) as a function of X, so ln(y) for y=0 is >> missing, then you might prefer -heckman-; if you want to model y as a >> function of X in one of those models, so y=0 is the lower limit, then >> you might prefer -tobit-, but both models incorporate a normality >> assumption that is usually violated in practice... see the Stata >> reference manuals and cited works for more discussion of the >> identifying assumptions. >> >> Presumably your two sets of expenditure data are for the same >> individuals, and exhibit correlated errors, so -nlsur- rather than >> -glm- may be in order. >> >> On Tue, Aug 19, 2008 at 1:33 AM, Shehzad Ali <sia500@york.ac.uk> wrote: >>> Thank you all for your very useful thoughts on this issue. >>> I am running regression on two separate sets of expenditure data: one for >>> general health expenditure which includes all costs including those for >>> self-medication etc., and second for expenditure related to formal health >>> care, including primary and hospital care but excluding self-medication. >>> >>> I agree that two-part model is not the best option but is -heckman- model a >>> resaonable alternative if the selection step is for zero/non-zero >>> expenditure and outcome for the positive expenditure? Looking at Austin's >>> argument, I understand that -heckman- run into similar problem as two-part >>> model. Is that right? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ Internal Virus Database is out of date. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.4.6/1540 - Release Date: 08/07/2008 06:33 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: stata code for two-part model***From:*"Shehzad Ali" <sia500@york.ac.uk>

**Re: st: stata code for two-part model***From:*"Austin Nichols" <austinnichols@gmail.com>

**Re: st: stata code for two-part model***From:*Shehzad Ali <sia500@york.ac.uk>

**Re: st: stata code for two-part model***From:*"Austin Nichols" <austinnichols@gmail.com>

**Re: st: stata code for two-part model***From:*"Stas Kolenikov" <skolenik@gmail.com>

**Re: st: stata code for two-part model***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**st: prediction after regress** - Next by Date:
**Re: st: prediction after regress** - Previous by thread:
**RE: st: stata code for two-part model** - Next by thread:
**st: Error: "could not calculate numerical derivatives; missing valuesencountered" with command "ml model d0..."** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |