[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: stata code for two-part model
"Austin Nichols" <email@example.com>
Re: st: stata code for two-part model
Wed, 20 Aug 2008 09:17:35 -0400
Shehzad et al.--
I think a plug for -fmm- (findit fmm) also belongs in this thread,
with a mention of the mixtureof(density) option, e.g.
mixtureof(gamma). Perhaps the package's author will comment on the
preferred mixture model for hospital expenditures, or you can consult
one of the refs listed in -help fmm-:
Deb, P. and P. K. Trivedi (1997), Demand for Medical Care by the
Elderly: A Finite Mixture Approach, Journal of Applied Econometrics,
-tobit- for expenditures works if willingness to pay for a good is
normally distributed and there is an observed market price, so those
with WTP<p spend nothing, but this is clearly not the case for
hospital expenditures. In fact, it's not clear whether we should
consider the patient or the doctor the "consumer" --I suspect this is
true even outside the US health care system, unlike much of the
concern about the effects of health insurance and incentives, but
perhaps to a lesser extent... in the British NHS for example, is it
you or your doctor who makes decisions about tests and treatments? I
think a simplified resolution of this problem is (a largely unstated)
part of the motivation for a two-part model: you make the decision
about whether or not to seek care, and conditional on seeking care,
your physician makes decisions about tests and treatments, so you no
longer control expenditures, to a first approximation.
On Tue, Aug 19, 2008 at 4:33 PM, Stas Kolenikov <firstname.lastname@example.org> wrote:
> -heckman- and -zip- are both trying to deal with too many zeroes (and
> so does -tobit-, but it puts just too many assumptions in... although
> originally it was developed for the expenditure models). -zip- says
> that for some reason, there is a probability of hitting zero before
> the rest of Poisson kicks in. -heckman- says that there is selection
> and (unobserved) utility functions at work. The selection models are
> more of the behavioral flavor, while zip models are more of the
> descriptive, if not population-averaging, nature, without trying to
> explain why certain people did or did not participate in <whatever>.
> Arguably, you can put a model similar to Heckman's model to hospital
> expenditure, too: if a person does not have (good enough) insurance,
> they may not be able to afford hospitalization, and choose not to go.
> If the (total discounted) budget is less than the predicted hospital
> bill, then we observe zero hospitalization costs. So there is a
> similar utility / budget interplay, and arguably Mills' ratio does
> belong in the linear regression part.
> Alternatively one can say that there are healthy people and sick
> people -- the former are spending zero on hospitals, and others spend
> some non-zero amounts, with the implicit assumption of perfect markets
> and absence of budget constraints. This does not seem quite right to
> me, but I can imagine there are occasions where that's how things
> might be working.
> In reality, both things should be at play: "too low" expenditure for
> the healthy, and "too high" expenditure for the poor. Ideally both
> should be modelled (and neither "true" expenditure is observed), but I
> am not aware of any models that are aimed specifically at that.
> On Tue, Aug 19, 2008 at 2:55 PM, Austin Nichols <email@example.com> wrote:
>> Shehzad Ali <firstname.lastname@example.org>:
>> An approach using -heckman- is discussed in the Mullahy ref mentioned
>> earlier (http://www.nber.org/papers/t0228), I believe, along with
>> If the conditional distribution of y seems to fall in two large
>> groups, one at zero and one at higher values, with zero density in
>> between, there may be more justification for one of the two-part types
>> of models where a case is either zero or nonzero, and then the nonzero
>> values are determined by a possibly different process.
>> If you want to model ln(y) as a function of X, so ln(y) for y=0 is
>> missing, then you might prefer -heckman-; if you want to model y as a
>> function of X in one of those models, so y=0 is the lower limit, then
>> you might prefer -tobit-, but both models incorporate a normality
>> assumption that is usually violated in practice... see the Stata
>> reference manuals and cited works for more discussion of the
>> identifying assumptions.
>> Presumably your two sets of expenditure data are for the same
>> individuals, and exhibit correlated errors, so -nlsur- rather than
>> -glm- may be in order.
>> On Tue, Aug 19, 2008 at 1:33 AM, Shehzad Ali <email@example.com> wrote:
>>> Thank you all for your very useful thoughts on this issue.
>>> I am running regression on two separate sets of expenditure data: one for
>>> general health expenditure which includes all costs including those for
>>> self-medication etc., and second for expenditure related to formal health
>>> care, including primary and hospital care but excluding self-medication.
>>> I agree that two-part model is not the best option but is -heckman- model a
>>> resaonable alternative if the selection step is for zero/non-zero
>>> expenditure and outcome for the positive expenditure? Looking at Austin's
>>> argument, I understand that -heckman- run into similar problem as two-part
>>> model. Is that right?
* For searches and help try: