[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Austin Nichols" <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: stata code for two-part model |

Date |
Mon, 18 Aug 2008 13:20:18 -0400 |

In expectation? People who have truly zero probability of incurring hospital costs? On Mon, Aug 18, 2008 at 1:08 PM, Lachenbruch, Peter <Peter.Lachenbruch@oregonstate.edu> wrote: > The problem was about hospitalization costs. These can be true zeros. > > Tony > > Peter A. Lachenbruch > Department of Public Health > Oregon State University > Corvallis, OR 97330 > Phone: 541-737-3832 > FAX: 541-737-4001 > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin > Nichols > Sent: Monday, August 18, 2008 9:38 AM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: stata code for two-part model > > Peter <Peter.Lachenbruch@oregonstate.edu>: > I think this claim is a bit of a red herring: "use of a continuous > model for data in which there is a clump of zeros seems incorrect." > Note that the -glm- approach assumes the mean of y given observables X > is nonzero, and E(y|X)=exp(Xb), not that observed y is nonzero! > Including the observations where y=0 is the whole point of the -glm- > approach--otherwise we would run ols regression of ln(y) on X. And if > you are claiming that the "true" model for (expected) healthcare > expenditures does have true zeros that are identifiable, then I > disagree. Some of your obs may spend nothing on health care (though > annual spending, including myriad items such as aspirin, is unlikely > to truly be zero for anyone) but that does not mean their conditional > mean should be zero. Maybe people who are dead have a conditional > mean of zero, but they should probably be excluded from the > analysis... > > When spending is measured in discrete dollars, a big clump of people > who have predicted spending less than 50 cents may have a conditional > mean of zero measured in the same units as the data. But that does > not mean their "true" conditional mean is zero. > > That said, a demand/expenditure model will have more and more "true" > (or rounded off) zeros as the category of demand/expenditure gets > narrower and narrower and the time window over which it is measured > gets narrower... think aspirin expenditures by week or day... but it > is not clear to me that a two-part model is the right approach even in > those cases. > > On Mon, Aug 18, 2008 at 11:33 AM, Lachenbruch, Peter > <Peter.Lachenbruch@oregonstate.edu> wrote: >> In some instances, the model for healthcare expenditures does have > true >> zeros that are identifiable. In one study I consulted on the data > came >> from a health insurer, and zeros were people who had not gone to >> hospital. >> >> The use of a continuous model for data in which there is a clump of >> zeros seems incorrect. There is no transformation that can remove > this >> clump. The severity of the problem depends a bit on the size of the >> clump. In the hospital insurance data (wanting to estimate >> hospitalization costs in the policy holders) 95% of the population had >> no costs. Pretending that these were continuous would lead to some >> nonsense results. At the present time, I have a data set that has 32 >> out of 145 people with zeros. However, these are not necessarily >> identifiable since they could be slightly greater than zero. I'm >> gritting my teeth on this and pretending all is well. However, a >> histogram shows enormous skewness. I'll probably try a square root. >> >> Tony >> >> Peter A. Lachenbruch >> Department of Public Health >> Oregon State University >> Corvallis, OR 97330 >> Phone: 541-737-3832 >> FAX: 541-737-4001 >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin >> Nichols >> Sent: Saturday, August 16, 2008 8:50 AM >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: stata code for two-part model >> >> Shehzad Ali et al. -- >> See also >> http://www.nber.org/papers/t0228 >> The two part models of health expenditures have always struck me as a >> bad idea; think about how you would get predictions for each indiv in >> your sample. The "stage 1" probit classifies people as having >> expenditures or not (some correctly, some not) and then the "stage 2" >> ols model gives predicted expenditures only for those people who >> actually have positive expenditures (not those who are classified by >> the probit as likely to have positive expenditures) unless you predict >> out of sample. At least one preferred approach of calculating >> marginal effects by comparing predictions over the whole sample turns >> out to be practically and analytically difficult in that setting. >> However, a -glm- with a log link (or equivalently a -poisson- >> regression) has no trouble: those people with extremely low predicted >> expenditures would round to zero predicted expenditures if you thought >> about a survey with expenditures measured discretely in dollars, say. >> Everyone has E(y)=exp(Xb) and there is no real issue with calculating >> marginal effects. Once you are in the -glm- framework it is also easy >> to think about model fit and alternative links... >> >> On Sat, Aug 16, 2008 at 3:41 AM, Eva Poen <eva.poen@gmail.com> wrote: >>> Shehzad, >>> >>> this looks like a hurdle model. Have you search the ssc archives to >>> see if someone else has programmed it for you? Have a look at >>> -hplogit-, for example. >>> >>> If you end up doing it yourself, I think you need to do a bit of >>> programming. In order for -mfx- to work after your estimation, you >>> need a way of telling it what you want the marginal effects to be >>> calculated for. In your case, this would be the overall expected cost >>> of care from your model. The way to feed this to -mfx- is via the >>> predict(predict_option), but for this to work you need to write a >>> -predict- command and an estimation command for your model. >>> >>> See for example this post: >>> http://www.stata.com/statalist/archive/2005-10/msg00091.html >>> >>> Hope this helps, >>> Eva >>> >>> >>> >>> 2008/8/16 Shehzad Ali <sia500@york.ac.uk>: >>>> Hi, >>>> >>>> I was wondering if someone can help with stata code for calculating >> marginal >>>> effects after two-part models for say, cost of care. Here, first > part >> is a >>>> probit model for seeking care or not, and the second part is an OLS >> model of >>>> cost of care, conditional on decision to seek care. Here is the >> simplified >>>> code: >>>> >>>> probit care $xvar >>>> >>>> reg cost $zvar if care==1 >>>> >>>> mfx >>>> >>>> I understand that mfx after the second part gives us the marginal >> effects >>>> for the OLS part only, and not the conditional marginal effects. >>>> >>>> Any help would be appreciated. >>>> >>>> Thanks, >>>> >>>> Shehzad > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: stata code for two-part model***From:*Shehzad Ali <sia500@york.ac.uk>

**References**:**st: stata code for two-part model***From:*"Shehzad Ali" <sia500@york.ac.uk>

**Re: st: stata code for two-part model***From:*"Eva Poen" <eva.poen@gmail.com>

**Re: st: stata code for two-part model***From:*"Austin Nichols" <austinnichols@gmail.com>

**RE: st: stata code for two-part model***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**Re: st: stata code for two-part model***From:*"Austin Nichols" <austinnichols@gmail.com>

**RE: st: stata code for two-part model***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

- Prev by Date:
**RE: Re: st: RE: RE: Bootstrap and Technical analysis** - Next by Date:
**Re: st: Chi square test unavailable when subpop is used in svy analyisis** - Previous by thread:
**RE: st: stata code for two-part model** - Next by thread:
**Re: st: stata code for two-part model** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |