Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: stata code for two-part model


From   "Austin Nichols" <[email protected]>
To   [email protected]
Subject   Re: st: stata code for two-part model
Date   Mon, 18 Aug 2008 13:20:18 -0400

In expectation?  People who have truly zero probability of incurring
hospital costs?

On Mon, Aug 18, 2008 at 1:08 PM, Lachenbruch, Peter
<[email protected]> wrote:
> The problem was about hospitalization costs.  These can be true zeros.
>
> Tony
>
> Peter A. Lachenbruch
> Department of Public Health
> Oregon State University
> Corvallis, OR 97330
> Phone: 541-737-3832
> FAX: 541-737-4001
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Austin
> Nichols
> Sent: Monday, August 18, 2008 9:38 AM
> To: [email protected]
> Subject: Re: st: stata code for two-part model
>
> Peter <[email protected]>:
> I think this claim is a bit of a red herring: "use of a continuous
> model for data in which there is a clump of zeros seems incorrect."
> Note that the -glm- approach assumes the mean of y given observables X
> is nonzero, and E(y|X)=exp(Xb), not that observed y is nonzero!
> Including the observations where y=0 is the whole point of the -glm-
> approach--otherwise we would run ols regression of ln(y) on X.  And if
> you are claiming that the "true" model for (expected) healthcare
> expenditures does have true zeros that are identifiable, then I
> disagree. Some of your obs may spend nothing on health care (though
> annual spending, including myriad items such as aspirin, is unlikely
> to truly be zero for anyone) but that does not mean their conditional
> mean should be zero.  Maybe people who are dead have a conditional
> mean of zero, but they should probably be excluded from the
> analysis...
>
> When spending is measured in discrete dollars, a big clump of people
> who have predicted spending less than 50 cents may have a conditional
> mean of zero measured in the same units as the data.  But that does
> not mean their "true" conditional mean is zero.
>
> That said, a demand/expenditure model will have more and more "true"
> (or rounded off) zeros as the category of demand/expenditure gets
> narrower and narrower and the time window over which it is measured
> gets narrower... think aspirin expenditures by week or day... but it
> is not clear to me that a two-part model is the right approach even in
> those cases.
>
> On Mon, Aug 18, 2008 at 11:33 AM, Lachenbruch, Peter
> <[email protected]> wrote:
>> In some instances, the model for healthcare expenditures does have
> true
>> zeros that are identifiable.  In one study I consulted on the data
> came
>> from a health insurer, and zeros were people who had not gone to
>> hospital.
>>
>> The use of a continuous model for data in which there is a clump of
>> zeros seems incorrect.  There is no transformation that can remove
> this
>> clump.  The severity of the problem depends a bit on the size of the
>> clump.  In the hospital insurance data (wanting to estimate
>> hospitalization costs in the policy holders) 95% of the population had
>> no costs.  Pretending that these were continuous would lead to some
>> nonsense results.  At the present time, I have a data set that has 32
>> out of 145 people with zeros.  However, these are not necessarily
>> identifiable since they could be slightly greater than zero.  I'm
>> gritting my teeth on this and pretending all is well.  However, a
>> histogram shows enormous skewness.  I'll probably try a square root.
>>
>> Tony
>>
>> Peter A. Lachenbruch
>> Department of Public Health
>> Oregon State University
>> Corvallis, OR 97330
>> Phone: 541-737-3832
>> FAX: 541-737-4001
>>
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Austin
>> Nichols
>> Sent: Saturday, August 16, 2008 8:50 AM
>> To: [email protected]
>> Subject: Re: st: stata code for two-part model
>>
>> Shehzad Ali et al. --
>> See also
>> http://www.nber.org/papers/t0228
>> The two part models of health expenditures have always struck me as a
>> bad idea; think about how you would get predictions for each indiv in
>> your sample.  The "stage 1" probit classifies people as having
>> expenditures or not (some correctly, some not) and then the "stage 2"
>> ols model gives predicted expenditures only for those people who
>> actually have positive expenditures (not those who are classified by
>> the probit as likely to have positive expenditures) unless you predict
>> out of sample.  At least one preferred approach of calculating
>> marginal effects by comparing predictions over the whole sample turns
>> out to be practically and analytically difficult in that setting.
>> However, a -glm- with a log link (or equivalently a -poisson-
>> regression) has no trouble: those people with extremely low predicted
>> expenditures would round to zero predicted expenditures if you thought
>> about a survey with expenditures measured discretely in dollars, say.
>> Everyone has E(y)=exp(Xb) and there is no real issue with calculating
>> marginal effects.  Once you are in the -glm- framework it is also easy
>> to think about model fit and alternative links...
>>
>> On Sat, Aug 16, 2008 at 3:41 AM, Eva Poen <[email protected]> wrote:
>>> Shehzad,
>>>
>>> this looks like a hurdle model. Have you search the ssc archives to
>>> see if someone else has programmed it for you? Have a look at
>>> -hplogit-, for example.
>>>
>>> If you end up doing it yourself, I think you need to do a bit of
>>> programming. In order for -mfx- to work after your estimation, you
>>> need a way of telling it what you want the marginal effects to be
>>> calculated for. In your case, this would be the overall expected cost
>>> of care from your model. The way to feed this to -mfx- is via the
>>> predict(predict_option), but for this to work you need to write a
>>> -predict- command and an estimation command for your model.
>>>
>>> See for example this post:
>>> http://www.stata.com/statalist/archive/2005-10/msg00091.html
>>>
>>> Hope this helps,
>>> Eva
>>>
>>>
>>>
>>> 2008/8/16 Shehzad Ali <[email protected]>:
>>>> Hi,
>>>>
>>>> I was wondering if someone can help with stata code for calculating
>> marginal
>>>> effects after two-part models for say, cost of care. Here, first
> part
>> is a
>>>> probit model for seeking care or not, and the second part is an OLS
>> model of
>>>> cost of care, conditional on decision to seek care. Here is the
>> simplified
>>>> code:
>>>>
>>>> probit care $xvar
>>>>
>>>> reg cost $zvar if care==1
>>>>
>>>> mfx
>>>>
>>>> I understand that mfx after the second part gives us the marginal
>> effects
>>>> for the OLS part only, and not the conditional marginal effects.
>>>>
>>>> Any help would be appreciated.
>>>>
>>>> Thanks,
>>>>
>>>> Shehzad
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index